[PATCH RFC] faster dirstate walks

Theodore Ts'o tytso at mit.edu
Thu Aug 25 14:50:01 CDT 2005


On Thu, Aug 25, 2005 at 02:57:46PM -0400, Chris Mason wrote:
> > You should sort anyway before calling stat because that tends to
> > reduce seek time with directory hashing on ext3 (this is crucial for
> > cold-cache performance).  Sorting by inode number is even better, but
> > Python probably doesn't provide it.

It will _only_ make it faster if you sort by inode number.  Sorting by
anything else will probably not make a difference as far as filesystem
performance is concerned.

> It might make ext3 dir hashing faster, but it will make other
> filesystems slower. Sorting by inode number is better, but you can't do
> that before the stat...

You can get the inode number from the readdir() system call; it might
not be available via Python interfaces, however.

> Bryan talked me into the sort on irc, it's not a huge performance
> difference.

If you have a directory with a huge (order tens or hundreds of
thousands of files) and a cold cache, sorting by inode number will
make a very noticeable difference.  So for example for Maildir
directories, you really, really want to do this.  I don't think that
configuration is likely to show up in most source trees in real life,
though, so I agree that it's probably not going to be worth it to try
to sort by inode number, unless it's trivially easy to do.

						- Ted


More information about the Mercurial mailing list