[PATCH] findrenames: sort files not by object id but by path for stable result

Ryan McElroy rm at fb.com
Sat Mar 25 08:44:41 EDT 2017



On 3/23/17 2:00 PM, Yuya Nishihara wrote:
> On Wed, 22 Mar 2017 16:14:01 +0000, Ryan McElroy wrote:
>> On 3/22/17 1:48 PM, Yuya Nishihara wrote:
>>> # HG changeset patch
>>> # User Yuya Nishihara <yuya at tcha.org>
>>> # Date 1426413536 -32400
>>> #      Sun Mar 15 18:58:56 2015 +0900
>>> # Node ID 425379e72e817b6c79e955173b015b9a0fe090e9
>>> # Parent  102f291807c92864a2231e5e925d6cd64783bb59
>>> findrenames: sort files not by object id but by path for stable result
>>>
>>> It seems the original implementation tried to sort added/removed files
>>> alphabetically, but actually it did sort fctx objects by address.
>>>
>>> This patch removes the use of set()s in order to preserve the order of
>>> added/removed files. addedfiles.remove() could be slightly slower than
>>> before, but it doesn't make much difference.
>> ...except perhaps on a large number of files added/removed, right? Why
>> not re-sort and move the possible O(n^2) to at worst O(n log n)? It
>> doesn't seem like it would be too much more work to me.
>>> benchmark (on tmpfs):
>>>     $ hg up -C 15afda349b11; hg purge --all; mv tests tests.new
>>>     $ LANG=C hg --time addremove -n > /dev/null
>>>
>>>     original:   real 0.420 secs (user 0.390+0.000 sys 0.030+0.000)
>>>     this patch: real 0.430 secs (user 0.390+0.000 sys 0.040+0.000)
>> I think the algorithm scales O(n^2), so I'd be much more interested in a
>> move of, say, 100k files than a move of 1 file that is dominated by
>> startup time.
> This moves "tests" directory which contains ~600 files. I'll show the number
> of 50k files in V2, but this quadratic list.remove() is relatively cheap in
> "hg addremove".

Ah, of course you're right there -- sorry I missed this (facepalm).

Thanks for doing a larger move to make it so I don't worry too much 
about the scalability though, I appreciate that.



More information about the Mercurial-devel mailing list