[PATCH 01 of 24] copies: make the loss in _backwardcopies more stable

Mon Dec 17 20:17:31 CST 2012

Kevin Bullock wrote, On 12/17/2012 05:57 AM:
> On 16 Dec 2012, at 4:33 PM, Mads Kiilerich wrote:
>
>> # HG changeset patch
>> # User Mads Kiilerich <mads at kiilerich.com>
>> # Date 1355687455 -3600
>> # Node ID 36ef75411e38e3cc4b60198ec20750b87b0a7545
>> # Parent  8c9a52492d426741ab24392d49f44a1d4f23613e
>> copies: make the loss in _backwardcopies more stable
>>
>> A couple of tests shows slightly more correct output. That is pure coincidence.
>>
>> diff --git a/mercurial/copies.py b/mercurial/copies.py
>> --- a/mercurial/copies.py
>> +++ b/mercurial/copies.py
>> @@ -150,7 +150,7 @@
>>      # in particular, we find renames better than copies
>>      f = _forwardcopies(b, a)
>>      r = {}
>> -    for k, v in f.iteritems():
>> +    for k, v in sorted(f.iteritems()):
> Right out of the gate here, we've got some potentially visible performance-impacting changes in this series. Can we see some numbers at least on the patches (like this one) that have this possibility?

Yes, I share your concern. But what kind of numbers would you like to see?

Adding a sort will obviously not make it faster. (Except for the cases 
where a consistent disk access pattern happens to give better utilization.)

The cost will of course depend on the size of the sorted list, but such 
lists are usually already both sorted and iterated with non-trivial 
constant overhead. Scaling to huge amounts of files would need a general 
solution to that anyway.

And in this case the patch introduces a sort of the list of copied files 
between two revisions when making a backward diff/status. A codepath 
that rarely is hit. It might be possible to make a test case and come up 
with some numbers, but I can't imagine how they could be used and which 
conclusion could be made.

/Mads