[PATCH] revert: remove set(mf) because it's O(repo)

Durham Goode durham at fb.com
Thu Mar 2 13:21:07 EST 2017



On 3/2/17 9:57 AM, Martin von Zweigbergk wrote:
> On Wed, Mar 1, 2017 at 8:01 PM, Durham Goode <durham at fb.com> wrote:
>> # HG changeset patch
>> # User Durham Goode <durham at fb.com>
>> # Date 1488426665 28800
>> #      Wed Mar 01 19:51:05 2017 -0800
>> # Node ID a8458fe51a9d155f1daeaffdcf503e674d4d4588
>> # Parent  b787c41767339158927232ec7a9092196e887453
>> revert: remove set(mf) because it's O(repo)
>
> O(repo) sounds misleading to me, because it sounds like number of
> revisions matter too. I'd change it to O(manifest)
>
>>
>> The revert code had a 'set(manifest)' line in it, which has a runtime equivalent
>> to the size of the working copy. With alternative manifest implementations, like
>> treemanifest, this can be extra expensive. Let's rewrite it to be O(changes)
>> instead of O(working copy size).
>
> Thanks! This was one of the remaining few pieces I never got around to
> fixing two years ago (the other place is copies.py). Now that I see
> your patch, I wonder why I didn't see that simple solution.
>
>>
>> diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
>> --- a/mercurial/cmdutil.py
>> +++ b/mercurial/cmdutil.py
>> @@ -2977,10 +2977,12 @@ def revert(ui, repo, ctx, parents, *pats
>>          modadded = set()
>>
>>          # split between files known in target manifest and the others
>
> Does it make sense to leave this comment here? I'm not sure what its
> scope is meant to be even before this patch.

Me either, which is why I left it... We can probably remove it though. 
Want me to resend, or you want to just drop this line during queueing, 
since I don't think I'm changing the bits below.

>> -        smf = set(mf)
>>
>>          # determine the exact nature of the deleted changesets
>> -        deladded = _deleted - smf
>> +        deladded = set(_deleted)
>> +        for path in _deleted:
>> +            if path in mf:
>> +                deladded.remove(path)
>
> Would it be better to not add it to the set in the first place?  Like so:
>
>> -        deladded = _deleted - smf
>> +        deladded = set()
>> +        for path in _deleted:
>> +            if path not in mf:
>> +                deladded.add(path)
>
> I can't see it making any difference in practice, so pick whichever
> seems simpler (i.e. feel free to ignore).

The set constructor does a smart allocation when doing a set copy, so we 
avoid internal set-growth reallocations by starting big and going small. 
  My assumption is that this stuff is going to be pretty small 
regardless, so it probably doesn't matter really.

See line 660 
http://svn.python.org/view/python/trunk/Objects/setobject.c?view=markup

>>          deleted = _deleted - deladded
>>
>>          # We need to account for the state of the file in the dirstate,
>> @@ -3024,7 +3026,10 @@ def revert(ui, repo, ctx, parents, *pats
>>          # in case of merge, files that are actually added can be reported as
>>          # modified, we need to post process the result
>>          if p2 != nullid:
>> -            mergeadd = dsmodified - smf
>> +            mergeadd = set(dsmodified)
>> +            for path in dsmodified:
>> +                if path in mf:
>> +                    mergeadd.remove(path)
>
> Same here.
>
>>              dsadded |= mergeadd
>>              dsmodified -= mergeadd
>>
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=uZTCRgAVm8p5eoX9JtG7NXyHYRWAtyDglOxpwq5zkdk&s=PC6LNzvhXaSgoieJBJpN2q_sfLvi5N6IBK0Iv9VZcCg&e=


More information about the Mercurial-devel mailing list