[PATCH 10 of 10 V2] treemanifest: optimize diff using the matcher

Durham Goode durham at fb.com
Wed Mar 8 17:44:46 EST 2017



On 3/8/17 1:01 PM, Martin von Zweigbergk wrote:
> On Tue, Mar 7, 2017 at 7:22 PM, Durham Goode <durham at fb.com> wrote:
>> # HG changeset patch
>> # User Durham Goode <durham at fb.com>
>> # Date 1488943242 28800
>> #      Tue Mar 07 19:20:42 2017 -0800
>> # Node ID 541bf866729342f534bac425bd8f01b9fe7564e8
>> # Parent  611fac63adb09c326912e56df59c828ad12ffd9f
>> treemanifest: optimize diff using the matcher
>>
>> This optimizes treemanifest.diff() to limit the tree traversal based on the
>> provided matcher. According to Martin's testing, `hg status --rev .~1 --rev .
>> foo/` goes from 1.3s to 0.18s on a tree version of Mozilla central. I'd expect
>> and even greater saving on larger internal repos at big companies.
>>
>> A previous patch added test coverage for treemanifest diff with patterns.
>>
>> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
>> --- a/mercurial/manifest.py
>> +++ b/mercurial/manifest.py
>> @@ -1053,10 +1053,6 @@ class treemanifest(object):
>>          the nodeid will be None and the flags will be the empty
>>          string.
>>          '''
>> -        if match:
>> -            m1 = self._matches(match)
>> -            m2 = m2._matches(match)
>> -            return m1.diff(m2, clean=clean)
>>          result = {}
>>          emptytree = treemanifest()
>>          def _diff(t1, t2):
>> @@ -1065,26 +1061,31 @@ class treemanifest(object):
>>              t1._load()
>>              t2._load()
>>              for d, m1 in t1._dirs.iteritems():
>> -                m2 = t2._dirs.get(d, emptytree)
>> -                _diff(m1, m2)
>> +                if not match or match.visitdir(os.path.join(t1.dir(), d[:-1])):
>> +                    m2 = t2._dirs.get(d, emptytree)
>> +                    _diff(m1, m2)
>>
>>              for d, m2 in t2._dirs.iteritems():
>>                  if d not in t1._dirs:
>> -                    _diff(emptytree, m2)
>> +                    if (not match or match.visitdir(os.path.join(t2.dir(),
>> +                                                                 d[:-1]))):
>> +                        _diff(emptytree, m2)
>
> Instead of doing it twice here, we can do at once at the top of the
> function. That will also handle the (rare) case of an excluded root
> directory. It's unclear if the extra function call will be any worse
> of doing it that way will be any worse than the extra path joins of
> doing it this way, so I'd go with the shorter form. Also, we should
> add an "or '.'" to handle the root directory (perhaps we'll change
> visitdir some day to use empty string for root directory and to accept
> the trailing slash for others). So, in all, something like this at the
> top of _diff:
>
>             if match and not match.visitdir(t1.dir()[:-1] or '.'):
>                 return
>
> That's untested.

Will do

> Btw (and unrelated to your patches), it looks like "clean" should be
> checked in that existing early return at the top of the function,
> don't you think?

I think you're right

>>
>>              for fn, n1 in t1._files.iteritems():
>> -                fl1 = t1._flags.get(fn, '')
>> -                n2 = t2._files.get(fn, None)
>> -                fl2 = t2._flags.get(fn, '')
>> -                if n1 != n2 or fl1 != fl2:
>> -                    result[t1._subpath(fn)] = ((n1, fl1), (n2, fl2))
>> -                elif clean:
>> -                    result[t1._subpath(fn)] = None
>> +                if not match or match(os.path.join(t1.dir(), fn)):
>
> Copy the _subpath call from below instead of calling join(). Is it
> worth caching that call in a variable (that's initially None,
> probably)?

Will do

>> +                    fl1 = t1._flags.get(fn, '')
>> +                    n2 = t2._files.get(fn, None)
>> +                    fl2 = t2._flags.get(fn, '')
>> +                    if n1 != n2 or fl1 != fl2:
>> +                        result[t1._subpath(fn)] = ((n1, fl1), (n2, fl2))
>> +                    elif clean:
>> +                        result[t1._subpath(fn)] = None
>>
>>              for fn, n2 in t2._files.iteritems():
>>                  if fn not in t1._files:
>> -                    fl2 = t2._flags.get(fn, '')
>> -                    result[t2._subpath(fn)] = ((None, ''), (n2, fl2))
>> +                    if not match or match(os.path.join(t2.dir(), fn)):
>> +                        fl2 = t2._flags.get(fn, '')
>> +                        result[t2._subpath(fn)] = ((None, ''), (n2, fl2))
>>
>>          _diff(self, m2)
>>          return result
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=AkIW-FvzdPLnH5EVSsn9KCAjciMPVvlgSXpEaEnD6Rg&s=R0EKehe5lsFC93IpeE23tC9J2LUiZkEjAmHyAduMcSk&e=


More information about the Mercurial-devel mailing list