[PATCH 4 of 4] dirstate: avoid use of zip on big lists

Matt Mackall mpm at selenic.com
Fri Nov 30 18:07:33 CST 2012


On Fri, 2012-11-30 at 14:20 -0800, Bryan O'Sullivan wrote:
> # HG changeset patch
> # User Bryan O'Sullivan <bryano at fb.com>
> # Date 1354313955 28800
> # Node ID ab0ec24445a5402cfc3322ac515c1ab3368b833c
> # Parent  59ca9fefdb7d956cb76d04f3acc420289736957e
> dirstate: avoid use of zip on big lists
> 
> In a clean working directory containing 170,000 tracked files, this
> improves performance of "hg --time diff" from 1.69 seconds to 1.43.

I'd rather see the results of perfstatus.

> This idea is due to Siddharth Agarwal.
> 
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -696,8 +696,9 @@
>          # step 3: report unseen items in the dmap hash
>          if not skipstep3 and not exact:
>              visit = sorted([f for f in dmap if f not in results and matchfn(f)])
> -            for nf, st in zip(visit, util.statfiles([join(i) for i in visit])):
> -                results[nf] = st
> +            nf = iter(visit).next
> +            for st in util.statfiles([join(i) for i in visit]):
> +                results[nf()] = st

That's not pretty, is it? The parallel iteration over visit is quite
confusing. Slightly better as:

nextstat = util.statfiles([join(i) for i in visit]).next
for nf in visit:
    results[nf] = nextstat()

Further, if we're motivated by space overhead, the list passed to
statfiles could be a generator. That tends to be a loss on non-gigantic
lists though.

As this is apparently the only user of statfiles, perhaps a better API
is possible.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list