[PATCH RFC] faster dirstate walks

Bryan O'Sullivan bos at serpentine.com
Thu Aug 25 13:38:49 CDT 2005


On Thu, 2005-08-25 at 14:29 -0400, Chris Mason wrote:

> 1) os.walk stats every file.  dirstate.changes then stats every file again.

Good.

> 2) os.walk yields every file and subdir to dirstate.traverse who yields every 
> file and everything in the dirstate map.  dirstate.walk then 
> filters this mass and yields every file to the caller.  There should be
> fewer steps in here.

Do you think that makes a performance difference, or do you just not
like it?

> I've turned os.walk into something hg specific that takes all the dirstate 
> ignore and matching rules into account.  The new function also takes an 
> optional function (interesting()) the caller can supply to help filter out 
> files it doesn't care about.

What's the difference between interesting and matchfn?

> Testing has been extremely light, this is really just proof of concept code.  
> One big functional change is the results of dirstate.walk are no longer sorted.  
> If the caller really wants sorted results (most don't I think) they should 
> collect into an array and sort themselves.

No, callers almost always want the results sorted.  The reason is that
most UI functions use the results of a walk directly.  Humans like
seeing their output in sorted order.

> +        def __walk(s):

Please find a different name for this.

> +                        if not self.ignore(ds) and \

We don't use line continuations.  Please wrap these kinds of clauses in
an extra set of parens instead.

Overall, I think the change is sensible.  Unfortunately, it's not any
less voodoo-like than the current walk code.  If you could add a few
descriptive comments here and there, so that future readers don't get
lost, that would be a good thing.

	<b



More information about the Mercurial mailing list