[PATCH 1 of 3 v4] graft: support grafting across move/copy (issue4028)

Pierre-Yves David pierre-yves.david at ens-lyon.org
Fri Aug 5 19:16:07 EDT 2016



On 08/04/2016 05:00 PM, Gábor STEFANIK wrote:
>>>>> +        unmatched = operator.add(*_computenonoverlap(repo, c1, c2,
>>>>> +                                 m1.filesnotin(ma), m2.filesnotin(ma), False))
>>>>> +
>>>>
>>>> So last time I pointed we should remove this operator.add. But you had a
>>>> comment about memory. can you elaborate?
>>>
>>> This way, computenonoverlap returns two lists, which operator.add can
>> combine
>>> in-place. If we instead saved the results in two temporary lists, it would
>> have to
>>> then duplicate them for the addition, so that a later write into the
>> temporary list
>>> doesn't change the sum.
>>>
>>> That's assuming the Python interpreter supports move semantics, just like
>> C++11.
>>> (I'm pretty sure it does.)
>>
>> I'm pretty sure you are way too optimistic about the Python interpreter.
>> I'm fairly certain it allocate two lists objects without doing anything
>> smart.
>
> IIRC it will do this with operator.add:
> -allocate an unnamed list of len(arg1) and store arg1 in it,
> -allocate an unnamed list of len(arg2) and store arg2 in it,
> -combine arg1 and arg2 in place and name it "unmatched".
>
> With named temporary variables, Python will instead:
> -allocate a named list of len(arg1) and store arg1 in it,
> -allocate a named list of len(arg2) and store arg2 in it,
> -allocate a list named "unmatched" of len(arg1+arg2) and copy elements from arg1 and arg2 into it.
>
> Because the two temporary lists are named, Python can't combine them in-place (otherwise a write into unmatched will change the temporaries and vice versa).
> With operator.add, the lists remain unnamed, so Python doesn't have to worry about future access to them, and is thus able to perform the concatenation in place (by simply changing the 1st list's final ->next pointer to point to the 2nd list's beginning instead of null).
>
> Memory use with named temporaries: len(arg1)+len(arg2)+len(arg1+arg2) = 2*len(arg1+arg2), with operator.add: len(arg1)+len(arg2) = len(arg1+arg2), difference: a factor of 2.
>
> (Quick note: pypy is able to optimize the temporary variables case, since it pre-interprets/JIT-compiles the whole function before beginning to execute it. Regular CPython doesn't do that, and will retain the temporaries until they go out of scope.)

I would be happy to be proven wrong, but I think you are way too 
optimistic about python here. The list1 and list2 are allocated in the 
upper function so I don't see Python doing anything smart here.

Could we use the simple version (temporary variable and "+"). Fell free 
to profive with data and number about how operator.add helps afterward 
to improve the code if you think this is worth it.
In all case, the memory usage of this will not matter much in regards 
with what's involved in the rest of the merge.

[…]
>>>>>      if of in ma:
>>>>>          diverge.setdefault(of, []).append(f)
>>>>> +    elif of in mta:
>>>>> +        if backwards:
>>>>> +            copyfrom[of] = f
>>>>> +        else:
>>>>> +            copyto[of] = f
>>>>
>>>> In the case where 'ca' is 'cta', this means we'll do the same (possibly
>>>> costly) membership test twice. As this is the common case, could we
>> make
>>>> the code a bit smarter here?
>>>
>>> The alternative is "elif cta != ca and of in mta", which trades a membership
>> test for a complex-object equality test, not sure which one is faster.
>>
>> The object should easily compare in O(1) (with eventually a int
>> comparison) the manifest membership test will get significant more
>> costly as the manifest grow and complexify.
>
> If we are able to compare two changectx's or two manifests like that without deep-comparing their members, then I'll change it to that.

Yes we are.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list