Clone performance for files consisting of many zeros differs significantly on Linux and Windows.

Angel Ezquerra angel.ezquerra at gmail.com
Mon Nov 26 03:00:00 CST 2012


On Mon, Nov 26, 2012 at 12:52 AM, Matt Mackall <mpm at selenic.com> wrote:
> On Fri, 2012-11-09 at 11:52 -0600, Matt Mackall wrote:
>> On Fri, 2012-11-09 at 09:56 +0000, Schueler Nikolaus (LQKG IT RDS)
>> wrote:
>> > Hi Matt,
>> >
>> > is there something we can do about this in the near future or does
>> > that need deeper research and rework for the Windows implementation?
>> > (I fear the second alternative may be more probable). If could assist
>> > in doing research or even fix, I would be glad to help.
>>
>> It's probably the sort of thing you can fix in an afternoon of
>> tinkering. The usual way to deal with this is to switch from a style
>> like this:
>>
>> a = ""
>> while len(a) < wanted:
>>     a += more()
>> return a
>>
>> where += implies a quadratic amount of copying to something like this:
>>
>> a = []
>> l = 0:
>> while l < wanted:
>>     a.append(more())
>>     l += len(a[-1])
>> return ''.join(a)
>>
>> ..which doesn't.
>
> And you wrote:
>
>> So you mean,  in other words, this code was written ignoring the usual
>> performance tips for Python ( ;-):
>
>> http://wiki.python.org/moin/PythonSpeed/PerformanceTips , section
>> "String concatenation".
>
>
> Here's a patch for benchmarking the path in question:
>
> diff -r d0d99c8bdf51 contrib/perf.py
> --- a/contrib/perf.py   Wed Nov 07 14:49:44 2012 +0100
> +++ b/contrib/perf.py   Sun Nov 25 17:06:01 2012 -0600
> @@ -55,6 +55,17 @@
>          cl._nodecache = {nullid: nullrev}
>          cl._nodepos = None
>
> +def perfchunk(ui, repo, bufsize, readsize):
> +    def d():
> +        l = [' ' * int(bufsize)]
> +        c = util.chunkbuffer(l)
> +        rs = int(readsize)
> +        while True:
> +            r = c.read(rs)
> +            if r == '':
> +                break
> +    timer(d)
> +
>  def perfheads(ui, repo):
>      cl = repo.changelog
>      def d():
> @@ -230,6 +241,7 @@
>
>  cmdtable = {
>      'perfcca': (perfcca, []),
> +    'perfchunk': (perfchunk, [], "BUFSIZE READSIZE"),
>      'perffncacheload': (perffncacheload, []),
>      'perffncachewrite': (perffncachewrite, []),
>      'perffncacheencode': (perffncacheencode, []),
>
>
> On Linux:
>
> $ hgs perfchunk 100000000 1000
> ! wall 0.830140 comb 0.830000 user 0.750000 sys 0.080000 (best of 12)
> $ hg perfchunk 100000000 100000000
> ! wall 0.117355 comb 0.110000 user 0.030000 sys 0.080000 (best of 84)
> $ hg perfchunk 200000000 200000000
> ! wall 0.239198 comb 0.220000 user 0.050000 sys 0.170000 (best of 41)
>
> On Wine:
>
> C:\hg>hg perfchunk 100000000 1000
> ! wall 2.660000 comb 2.660000 user 1.290000 sys 1.370000 (best of 4)C:
> \hg\contrib>hg perfchunk 100000000 100000000
> ! wall 12.236000 comb 12.010000 user 4.560000 sys 7.450000 (best of 3)
> C:\hg\contrib>hg perfchunk 200000000 200000000
> ! wall 49.851000 comb 47.960000 user 17.920000 sys 30.040000 (best of 3)
>
> So there's our quadratic-only-on-Windows behavior.
>
> After the "obvious fix":
>
> diff -r d0d99c8bdf51 mercurial/util.py
> --- a/mercurial/util.py Wed Nov 07 14:49:44 2012 +0100
> +++ b/mercurial/util.py Sun Nov 25 17:27:23 2012 -0600
> @@ -899,7 +899,7 @@
>          """Read L bytes of data from the iterator of chunks of data.
>          Returns less than L bytes if the iterator runs dry."""
>          left = l
> -        buf = ''
> +        buf = []
>          queue = self._queue
>          while left > 0:
>              # refill the queue
> @@ -917,11 +917,11 @@
>              left -= len(chunk)
>              if left < 0:
>                  queue.appendleft(chunk[left:])
> -                buf += chunk[:left]
> +                buf.append(chunk[:left])
>              else:
> -                buf += chunk
> +                buf.append(chunk)
>
> -        return buf
> +        return ''.join(buf)
>
>  def filechunkiter(f, size=65536, limit=None):
>      """Create a generator that produces the data in the file size
>
> we get:
>
> Linux:
>
> $ hg perfchunk 100000000 1000
> ! wall 0.863241 comb 0.860000 user 0.800000 sys 0.060000 (best of 12)
> $ hg perfchunk 100000000 100000000
> ! wall 0.165348 comb 0.150000 user 0.040000 sys 0.110000 (best of 61)
> $ hg perfchunk 200000000 200000000
> ! wall 0.329871 comb 0.310000 user 0.090000 sys 0.220000 (best of 30)
>
> Wine:
>
> C:\hg\contrib>hg perfchunk 100000000 1000
> ! wall 2.205000 comb 2.190000 user 1.160000 sys 1.030000 (best of 5)
> C:\hg\contrib>hg perfchunk 100000000 100000000
> ! wall 0.167000 comb 0.160000 user 0.040000 sys 0.120000 (best of 60)
> C:\hg\contrib>hg perfchunk 200000000 200000000
> ! wall 0.336000 comb 0.320000 user 0.070000 sys 0.250000 (best of 30)
>
> This is probably why we didn't do the "obvious fix" the first time
> around: it's 50% SLOWER on the platform that the bulk of contributors
> use and benchmark against.
>
> Please test the fix on a real Windows machine (I don't have one) and
> report back. You may find this useful:
>
> http://mercurial.selenic.com/wiki/HackableMercurial
>

I just did some tests with this fix and I posted my results in this thread:

http://markmail.org/message/hm3zocskkrspcygp

My tests show a very noticeable improvement with this "fix" on my
Windows 7 x65 PC.

Cheers,

Angel


More information about the Mercurial-devel mailing list