[PATCH 3 of 3 RFC] localrepo: use ctx.size comparisons to speed up status

Sun Jul 25 11:02:43 CDT 2010

On Sun, 2010-07-25 at 11:12 +0900, Nicolas Dumazet wrote:
> # HG changeset patch
> # User Nicolas Dumazet <nicdumz.commits at gmail.com>
> # Date 1278816052 -32400
> # Node ID 42b4ba8013abce794478c689201399ecf8294540
> # Parent  dca39a137eaa3f107c6b6419540a0afca702d3eb
> localrepo: use ctx.size comparisons to speed up status
> 
> Comparing sizes is cheaper than comparing file contents, as it does not
> involve reading the file on disk or from the filelog.
> 
> It is however not always possible: some extensions, or encode filters,
> change data when extracting it to the working directory.
> _cancomparesize is meant to detect cases where such comparisons are not
> possible. A _cancomparesize() call is cheap, as _loadfilter is caching
> its results in filterpats.
> 
> Unwrapping the complex inlined boolean comparisons produces longer code,
> but boolean logic has not been changed, except for the size check
> before ctx.cmp calls.
> 
> diff --git a/hgext/keyword.py b/hgext/keyword.py
> --- a/hgext/keyword.py
> +++ b/hgext/keyword.py
> @@ -502,6 +502,11 @@
>                                False, True)
>              return n
>  
> +        def _cancomparesize(self):
> +            # keywords affect data size, comparing wdir and filelog size does
> +            # not make sense
> +            return False
> +

Somehow I think this would work out better as a helper function of some
sort that actually did the comparison. Possibly in filelog.

> +
> +            if listclean:
> +                appendclean = clean.append
> +            else:
> +                def appendclean(fn): pass
> +            appendmodified = modified.append
> +

This is a bit too clever. Python function calls are slow:

$ python -m timeit -s 'a = []; x = False' -s 'def aa(x): pass' 'for i in
xrange(1000000): aa(1)'
10 loops, best of 3: 239 msec per loop

$ python -m timeit -s 'a = []; x = False' -s 'aa = lambda x: None' 'for
i in xrange(1000000): aa(1)'
10 loops, best of 3: 240 msec per loop

$ python -m timeit -s 'a = []; x = False; aa = a.append' 'for i in
xrange(1000000):
  if x: aa(1)'
10 loops, best of 3: 62.6 msec per loop

$ python -m timeit -s 'a = []; x = True; aa = a.append' 'for i in
xrange(1000000):
  if x: aa(1)'
10 loops, best of 3: 133 msec per loop

>              for fn in mf2:
>                  if fn in mf1:
> -                    if (mf1.flags(fn) != mf2.flags(fn) or
> -                        (mf1[fn] != mf2[fn] and
> -                         (mf2[fn] or ctx1[fn].cmp(ctx2[fn].data())))):
> -                        modified.append(fn)
> -                    elif listclean:
> -                        clean.append(fn)
> +                    action = appendmodified
> +                    if mf1.flags(fn) == mf2.flags(fn):
> +                        if mf1[fn] == mf2[fn]:
> +                            action = appendclean
> +                        elif not mf2[fn]:
> +                            f1 = ctx1[fn]
> +                            f2 = ctx2[fn]
> +                            sizematch = not checksize or f1.size() == f2.size()
> +                            if sizematch and not f1.cmp(f2.data()):
> +                                action = appendclean
> +                    action(fn)
>                      del mf1[fn]
>                  else:
>                      added.append(fn)
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel

-- 
Mathematics is the supreme nostalgia of our time.