[PATCH 2 of 4] addremove: don't call lexists, isdir, and islink

Matt Mackall mpm at selenic.com
Tue Feb 5 15:30:54 CST 2013


On Mon, 2013-02-04 at 19:20 -0800, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham at fb.com>
> # Date 1360015580 28800
> # Node ID f8d0ae2b9805840a73f4c3e44d64043cf47691de
> # Parent  3457575aad13d6e7b8ccb9a45f9e49cf1f2b27fe
> addremove: don't call lexists, isdir, and islink
> 
> The dirstate walk results contain the stat information for each path, so we
> don't need to query it again.  On a large repo this makes addremove go from
> 8.35 seconds to 7.1 (15%).

...
> -    for abs in repo.walk(m):
...
> +    walkresults = repo.dirstate.walk(m, sorted(ctx.substate), True, False)
> +    for abs in sorted(walkresults):
...

Uhh, what's this bit about??

repo.walk:

    def walk(self, match, node=None):
        '''                                                                                                                                                                                       
        walk recursively through the directory tree or a given                                                                                                                                    
        changeset, finding all files matched by the match                                                                                                                                         
        function                                                                                                                                                                                  
        '''
	return self[node].walk(match)

workingcontext.walk:

    def walk(self, match):
        return sorted(self._repo.dirstate.walk(match, sorted(self.substate),
                                               True, False))

I see, we're sneaking in a nano-optimization that's busting through two
layers of abstraction to avoid the overhead of two function calls.

As slow as Python is, it's not going to be a measurable performance
improvement as it's outside the loop. I just spent ten minutes trying to
understand this change. At about .1us saved per addremove, we're going
to need about 6000 Facebook-sized projects doing addremove on every
commit to break even. And that's assuming no one else ever tries to
understand this line of code.

Let's not do that sort of thing.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list