[PATCH 3 of 3 RFC] localrepo: use ctx.size comparisons to speed up status

Nicolas Dumazet nicdumz at gmail.com
Sun Jul 25 17:33:48 CDT 2010


2010/7/26 Matt Mackall <mpm at selenic.com>:
> On Sun, 2010-07-25 at 11:12 +0900, Nicolas Dumazet wrote:
>> # HG changeset patch
>> # User Nicolas Dumazet <nicdumz.commits at gmail.com>
>> # Date 1278816052 -32400
>> # Node ID 42b4ba8013abce794478c689201399ecf8294540
>> # Parent  dca39a137eaa3f107c6b6419540a0afca702d3eb
>> localrepo: use ctx.size comparisons to speed up status
>>
>> Comparing sizes is cheaper than comparing file contents, as it does not
>> involve reading the file on disk or from the filelog.
>>
>> It is however not always possible: some extensions, or encode filters,
>> change data when extracting it to the working directory.
>> _cancomparesize is meant to detect cases where such comparisons are not
>> possible. A _cancomparesize() call is cheap, as _loadfilter is caching
>> its results in filterpats.
>>
>> Unwrapping the complex inlined boolean comparisons produces longer code,
>> but boolean logic has not been changed, except for the size check
>> before ctx.cmp calls.
>>
>> diff --git a/hgext/keyword.py b/hgext/keyword.py
>> --- a/hgext/keyword.py
>> +++ b/hgext/keyword.py
>> @@ -502,6 +502,11 @@
>>                                False, True)
>>              return n
>>
>> +        def _cancomparesize(self):
>> +            # keywords affect data size, comparing wdir and filelog size does
>> +            # not make sense
>> +            return False
>> +
>
> Somehow I think this would work out better as a helper function of some
> sort that actually did the comparison. Possibly in filelog.

I have to somehow access the underlying repo, to check if there are
some encode/decode filters on; and I'm not sure how to do this from
the filelog?

What about doing this in filectx? (filectx.smartcmp ?
filectx.sizecmp?) It would first check the size, and if equal, check
the usual ctx.cmp. Sounds good?
Maybe  I should even implement this directly in filectx.cmp?



>
>> +
>> +            if listclean:
>> +                appendclean = clean.append
>> +            else:
>> +                def appendclean(fn): pass
>> +            appendmodified = modified.append
>> +
>
> This is a bit too clever. Python function calls are slow:
>

Oops. I just learned something. I can probably do something about this.

-Nicolas.

> $ python -m timeit -s 'a = []; x = False' -s 'def aa(x): pass' 'for i in
> xrange(1000000): aa(1)'
> 10 loops, best of 3: 239 msec per loop
>
> $ python -m timeit -s 'a = []; x = False' -s 'aa = lambda x: None' 'for
> i in xrange(1000000): aa(1)'
> 10 loops, best of 3: 240 msec per loop
>
> $ python -m timeit -s 'a = []; x = False; aa = a.append' 'for i in
> xrange(1000000):
>  if x: aa(1)'
> 10 loops, best of 3: 62.6 msec per loop
>
> $ python -m timeit -s 'a = []; x = True; aa = a.append' 'for i in
> xrange(1000000):
>  if x: aa(1)'
> 10 loops, best of 3: 133 msec per loop
>
>
>
>>              for fn in mf2:
>>                  if fn in mf1:
>> -                    if (mf1.flags(fn) != mf2.flags(fn) or
>> -                        (mf1[fn] != mf2[fn] and
>> -                         (mf2[fn] or ctx1[fn].cmp(ctx2[fn].data())))):
>> -                        modified.append(fn)
>> -                    elif listclean:
>> -                        clean.append(fn)
>> +                    action = appendmodified
>> +                    if mf1.flags(fn) == mf2.flags(fn):
>> +                        if mf1[fn] == mf2[fn]:
>> +                            action = appendclean
>> +                        elif not mf2[fn]:
>> +                            f1 = ctx1[fn]
>> +                            f2 = ctx2[fn]
>> +                            sizematch = not checksize or f1.size() == f2.size()
>> +                            if sizematch and not f1.cmp(f2.data()):
>> +                                action = appendclean
>> +                    action(fn)
>>                      del mf1[fn]
>>                  else:
>>                      added.append(fn)
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
>
>
> --
> Mathematics is the supreme nostalgia of our time.
>
>
>



-- 
Nicolas Dumazet — NicDumZ


More information about the Mercurial-devel mailing list