[PATCH 3 of 4] copies: calculate 'bothnew' from manifestdict.filesnotin()

Durham Goode durham at fb.com
Fri Feb 27 19:08:36 CST 2015



On 2/27/15, 2:46 PM, "Martin von Zweigbergk" <martinvonz at google.com> wrote:

># HG changeset patch
># User Martin von Zweigbergk <martinvonz at google.com>
># Date 1425074581 28800
>#      Fri Feb 27 14:03:01 2015 -0800
># Node ID b3bcf58446fdebf3672edbbc55c24509e549eb22
># Parent  89f810fb00184d3a1dd49412d0c3256a596ddca8
>copies: calculate 'bothnew' from manifestdict.filesnotin()
>
>In the same spirit as the previous change, let's now calculate the
>'bothnew' variable using manifestdict.filesnotin().5D
>
>diff -r 89f810fb0018 -r b3bcf58446fd mercurial/copies.py
>--- a/mercurial/copies.py	Fri Feb 27 14:02:30 2015 -0800
>+++ b/mercurial/copies.py	Fri Feb 27 14:03:01 2015 -0800
>@@ -302,7 +302,9 @@
>         else:
>             diverge2.update(fl) # reverse map for below
> 
>-    bothnew = sorted([d for d in m1 if d in m2 and d not in ma])
>+    addedinm1 = m1.filesnotin(ma)
>+    addedinm2 = m2.filesnotin(ma)
>+    bothnew = sorted(addedinm1 & addedinm2)

I was concerned about perf here (since we¹re constructing sets when we
used to not, and we¹re iterating over both m1 and m2 when we used to not),
but I ran the numbers and the new stuff is actually faster. To diff
million file manifests with 5k files different each, previously it was
0.36s and now it¹s 0.23.


The script for posterity:

#!/bin/env python
import time

fm1 = list(xrange(0, 990000))
fm2 = list(xrange(0, 990000))
fma = list(xrange(0, 990000))
fm1.extend(range(1000000, 1005000))
fm2.extend(range(1005000, 1010000))
fma.extend(range(1010000, 1015000))

m1 = set(fm1)
m2 = set(fm2)
ma = set(fma)

start = time.time()
old = sorted([d for d in fm1 if d in m2 and d not in ma])
print ("Old %s" % str(time.time() - start))

def doit(d1, d2):
    missing = set(d1)
    missing.difference_update(d2)
    return missing

start = time.time()
addedinm1 = doit(fm1, fma)
addedinm2 = doit(fm2, fma)
bothnew = sorted(addedinm1 & addedinm2)
print ("New %s" % str(time.time() - start))

~/local> ./testset.py
Old 0.343271970749
New 0.231050014496





More information about the Mercurial-devel mailing list