[PATCH RFC] faster dirstate walks
Theodore Ts'o
tytso at mit.edu
Thu Aug 25 14:50:01 CDT 2005
On Thu, Aug 25, 2005 at 02:57:46PM -0400, Chris Mason wrote:
> > You should sort anyway before calling stat because that tends to
> > reduce seek time with directory hashing on ext3 (this is crucial for
> > cold-cache performance). Sorting by inode number is even better, but
> > Python probably doesn't provide it.
It will _only_ make it faster if you sort by inode number. Sorting by
anything else will probably not make a difference as far as filesystem
performance is concerned.
> It might make ext3 dir hashing faster, but it will make other
> filesystems slower. Sorting by inode number is better, but you can't do
> that before the stat...
You can get the inode number from the readdir() system call; it might
not be available via Python interfaces, however.
> Bryan talked me into the sort on irc, it's not a huge performance
> difference.
If you have a directory with a huge (order tens or hundreds of
thousands of files) and a cold cache, sorting by inode number will
make a very noticeable difference. So for example for Maildir
directories, you really, really want to do this. I don't think that
configuration is likely to show up in most source trees in real life,
though, so I agree that it's probably not going to be worth it to try
to sort by inode number, unless it's trivially easy to do.
- Ted
More information about the Mercurial
mailing list