Clone performance for files consisting of many zeros differs significantly on Linux and Windows.
Angel Ezquerra
angel.ezquerra at gmail.com
Mon Nov 26 03:00:00 CST 2012
On Mon, Nov 26, 2012 at 12:52 AM, Matt Mackall <mpm at selenic.com> wrote:
> On Fri, 2012-11-09 at 11:52 -0600, Matt Mackall wrote:
>> On Fri, 2012-11-09 at 09:56 +0000, Schueler Nikolaus (LQKG IT RDS)
>> wrote:
>> > Hi Matt,
>> >
>> > is there something we can do about this in the near future or does
>> > that need deeper research and rework for the Windows implementation?
>> > (I fear the second alternative may be more probable). If could assist
>> > in doing research or even fix, I would be glad to help.
>>
>> It's probably the sort of thing you can fix in an afternoon of
>> tinkering. The usual way to deal with this is to switch from a style
>> like this:
>>
>> a = ""
>> while len(a) < wanted:
>> a += more()
>> return a
>>
>> where += implies a quadratic amount of copying to something like this:
>>
>> a = []
>> l = 0:
>> while l < wanted:
>> a.append(more())
>> l += len(a[-1])
>> return ''.join(a)
>>
>> ..which doesn't.
>
> And you wrote:
>
>> So you mean, in other words, this code was written ignoring the usual
>> performance tips for Python ( ;-):
>
>> http://wiki.python.org/moin/PythonSpeed/PerformanceTips , section
>> "String concatenation".
>
>
> Here's a patch for benchmarking the path in question:
>
> diff -r d0d99c8bdf51 contrib/perf.py
> --- a/contrib/perf.py Wed Nov 07 14:49:44 2012 +0100
> +++ b/contrib/perf.py Sun Nov 25 17:06:01 2012 -0600
> @@ -55,6 +55,17 @@
> cl._nodecache = {nullid: nullrev}
> cl._nodepos = None
>
> +def perfchunk(ui, repo, bufsize, readsize):
> + def d():
> + l = [' ' * int(bufsize)]
> + c = util.chunkbuffer(l)
> + rs = int(readsize)
> + while True:
> + r = c.read(rs)
> + if r == '':
> + break
> + timer(d)
> +
> def perfheads(ui, repo):
> cl = repo.changelog
> def d():
> @@ -230,6 +241,7 @@
>
> cmdtable = {
> 'perfcca': (perfcca, []),
> + 'perfchunk': (perfchunk, [], "BUFSIZE READSIZE"),
> 'perffncacheload': (perffncacheload, []),
> 'perffncachewrite': (perffncachewrite, []),
> 'perffncacheencode': (perffncacheencode, []),
>
>
> On Linux:
>
> $ hgs perfchunk 100000000 1000
> ! wall 0.830140 comb 0.830000 user 0.750000 sys 0.080000 (best of 12)
> $ hg perfchunk 100000000 100000000
> ! wall 0.117355 comb 0.110000 user 0.030000 sys 0.080000 (best of 84)
> $ hg perfchunk 200000000 200000000
> ! wall 0.239198 comb 0.220000 user 0.050000 sys 0.170000 (best of 41)
>
> On Wine:
>
> C:\hg>hg perfchunk 100000000 1000
> ! wall 2.660000 comb 2.660000 user 1.290000 sys 1.370000 (best of 4)C:
> \hg\contrib>hg perfchunk 100000000 100000000
> ! wall 12.236000 comb 12.010000 user 4.560000 sys 7.450000 (best of 3)
> C:\hg\contrib>hg perfchunk 200000000 200000000
> ! wall 49.851000 comb 47.960000 user 17.920000 sys 30.040000 (best of 3)
>
> So there's our quadratic-only-on-Windows behavior.
>
> After the "obvious fix":
>
> diff -r d0d99c8bdf51 mercurial/util.py
> --- a/mercurial/util.py Wed Nov 07 14:49:44 2012 +0100
> +++ b/mercurial/util.py Sun Nov 25 17:27:23 2012 -0600
> @@ -899,7 +899,7 @@
> """Read L bytes of data from the iterator of chunks of data.
> Returns less than L bytes if the iterator runs dry."""
> left = l
> - buf = ''
> + buf = []
> queue = self._queue
> while left > 0:
> # refill the queue
> @@ -917,11 +917,11 @@
> left -= len(chunk)
> if left < 0:
> queue.appendleft(chunk[left:])
> - buf += chunk[:left]
> + buf.append(chunk[:left])
> else:
> - buf += chunk
> + buf.append(chunk)
>
> - return buf
> + return ''.join(buf)
>
> def filechunkiter(f, size=65536, limit=None):
> """Create a generator that produces the data in the file size
>
> we get:
>
> Linux:
>
> $ hg perfchunk 100000000 1000
> ! wall 0.863241 comb 0.860000 user 0.800000 sys 0.060000 (best of 12)
> $ hg perfchunk 100000000 100000000
> ! wall 0.165348 comb 0.150000 user 0.040000 sys 0.110000 (best of 61)
> $ hg perfchunk 200000000 200000000
> ! wall 0.329871 comb 0.310000 user 0.090000 sys 0.220000 (best of 30)
>
> Wine:
>
> C:\hg\contrib>hg perfchunk 100000000 1000
> ! wall 2.205000 comb 2.190000 user 1.160000 sys 1.030000 (best of 5)
> C:\hg\contrib>hg perfchunk 100000000 100000000
> ! wall 0.167000 comb 0.160000 user 0.040000 sys 0.120000 (best of 60)
> C:\hg\contrib>hg perfchunk 200000000 200000000
> ! wall 0.336000 comb 0.320000 user 0.070000 sys 0.250000 (best of 30)
>
> This is probably why we didn't do the "obvious fix" the first time
> around: it's 50% SLOWER on the platform that the bulk of contributors
> use and benchmark against.
>
> Please test the fix on a real Windows machine (I don't have one) and
> report back. You may find this useful:
>
> http://mercurial.selenic.com/wiki/HackableMercurial
>
I just did some tests with this fix and I posted my results in this thread:
http://markmail.org/message/hm3zocskkrspcygp
My tests show a very noticeable improvement with this "fix" on my
Windows 7 x65 PC.
Cheers,
Angel
More information about the Mercurial-devel
mailing list