Clone performance for files consisting of many zeros differs significantly on Linux and Windows.

Mon Nov 26 17:07:07 CST 2012

On Mon, 2012-11-26 at 10:00 +0100, Angel Ezquerra wrote:
> On Mon, Nov 26, 2012 at 12:52 AM, Matt Mackall <mpm at selenic.com> wrote:
> > On Fri, 2012-11-09 at 11:52 -0600, Matt Mackall wrote:
> >> On Fri, 2012-11-09 at 09:56 +0000, Schueler Nikolaus (LQKG IT RDS)
> >> wrote:
> >> > Hi Matt,
> >> >
> >> > is there something we can do about this in the near future or does
> >> > that need deeper research and rework for the Windows implementation?
> >> > (I fear the second alternative may be more probable). If could assist
> >> > in doing research or even fix, I would be glad to help.
> >>
> >> It's probably the sort of thing you can fix in an afternoon of
> >> tinkering. The usual way to deal with this is to switch from a style
> >> like this:
> >>
> >> a = ""
> >> while len(a) < wanted:
> >>     a += more()
> >> return a
> >>
> >> where += implies a quadratic amount of copying to something like this:
> >>
> >> a = []
> >> l = 0:
> >> while l < wanted:
> >>     a.append(more())
> >>     l += len(a[-1])
> >> return ''.join(a)
> >>
> >> ..which doesn't.
> >
> > And you wrote:
> >
> >> So you mean,  in other words, this code was written ignoring the usual
> >> performance tips for Python ( ;-):
> >
> >> http://wiki.python.org/moin/PythonSpeed/PerformanceTips , section
> >> "String concatenation".
> >
> >
> > Here's a patch for benchmarking the path in question:
> >
> > diff -r d0d99c8bdf51 contrib/perf.py
> > --- a/contrib/perf.py   Wed Nov 07 14:49:44 2012 +0100
> > +++ b/contrib/perf.py   Sun Nov 25 17:06:01 2012 -0600
> > @@ -55,6 +55,17 @@
> >          cl._nodecache = {nullid: nullrev}
> >          cl._nodepos = None
> >
> > +def perfchunk(ui, repo, bufsize, readsize):
> > +    def d():
> > +        l = [' ' * int(bufsize)]
> > +        c = util.chunkbuffer(l)
> > +        rs = int(readsize)
> > +        while True:
> > +            r = c.read(rs)
> > +            if r == '':
> > +                break
> > +    timer(d)
> > +
> >  def perfheads(ui, repo):
> >      cl = repo.changelog
> >      def d():
> > @@ -230,6 +241,7 @@
> >
> >  cmdtable = {
> >      'perfcca': (perfcca, []),
> > +    'perfchunk': (perfchunk, [], "BUFSIZE READSIZE"),
> >      'perffncacheload': (perffncacheload, []),
> >      'perffncachewrite': (perffncachewrite, []),
> >      'perffncacheencode': (perffncacheencode, []),
> >
> >
> > On Linux:
> >
> > $ hgs perfchunk 100000000 1000
> > ! wall 0.830140 comb 0.830000 user 0.750000 sys 0.080000 (best of 12)
> > $ hg perfchunk 100000000 100000000
> > ! wall 0.117355 comb 0.110000 user 0.030000 sys 0.080000 (best of 84)
> > $ hg perfchunk 200000000 200000000
> > ! wall 0.239198 comb 0.220000 user 0.050000 sys 0.170000 (best of 41)
> >
> > On Wine:
> >
> > C:\hg>hg perfchunk 100000000 1000
> > ! wall 2.660000 comb 2.660000 user 1.290000 sys 1.370000 (best of 4)C:
> > \hg\contrib>hg perfchunk 100000000 100000000
> > ! wall 12.236000 comb 12.010000 user 4.560000 sys 7.450000 (best of 3)
> > C:\hg\contrib>hg perfchunk 200000000 200000000
> > ! wall 49.851000 comb 47.960000 user 17.920000 sys 30.040000 (best of 3)
> >
> > So there's our quadratic-only-on-Windows behavior.
> >
> > After the "obvious fix":
> >
> > diff -r d0d99c8bdf51 mercurial/util.py
> > --- a/mercurial/util.py Wed Nov 07 14:49:44 2012 +0100
> > +++ b/mercurial/util.py Sun Nov 25 17:27:23 2012 -0600
> > @@ -899,7 +899,7 @@
> >          """Read L bytes of data from the iterator of chunks of data.
> >          Returns less than L bytes if the iterator runs dry."""
> >          left = l
> > -        buf = ''
> > +        buf = []
> >          queue = self._queue
> >          while left > 0:
> >              # refill the queue
> > @@ -917,11 +917,11 @@
> >              left -= len(chunk)
> >              if left < 0:
> >                  queue.appendleft(chunk[left:])
> > -                buf += chunk[:left]
> > +                buf.append(chunk[:left])
> >              else:
> > -                buf += chunk
> > +                buf.append(chunk)
> >
> > -        return buf
> > +        return ''.join(buf)
> >
> >  def filechunkiter(f, size=65536, limit=None):
> >      """Create a generator that produces the data in the file size
> >
> > we get:
> >
> > Linux:
> >
> > $ hg perfchunk 100000000 1000
> > ! wall 0.863241 comb 0.860000 user 0.800000 sys 0.060000 (best of 12)
> > $ hg perfchunk 100000000 100000000
> > ! wall 0.165348 comb 0.150000 user 0.040000 sys 0.110000 (best of 61)
> > $ hg perfchunk 200000000 200000000
> > ! wall 0.329871 comb 0.310000 user 0.090000 sys 0.220000 (best of 30)
> >
> > Wine:
> >
> > C:\hg\contrib>hg perfchunk 100000000 1000
> > ! wall 2.205000 comb 2.190000 user 1.160000 sys 1.030000 (best of 5)
> > C:\hg\contrib>hg perfchunk 100000000 100000000
> > ! wall 0.167000 comb 0.160000 user 0.040000 sys 0.120000 (best of 60)
> > C:\hg\contrib>hg perfchunk 200000000 200000000
> > ! wall 0.336000 comb 0.320000 user 0.070000 sys 0.250000 (best of 30)
> >
> > This is probably why we didn't do the "obvious fix" the first time
> > around: it's 50% SLOWER on the platform that the bulk of contributors
> > use and benchmark against.
> >
> > Please test the fix on a real Windows machine (I don't have one) and
> > report back. You may find this useful:
> >
> > http://mercurial.selenic.com/wiki/HackableMercurial
> >
> 
> I just did some tests with this fix and I posted my results in this thread:
> 
> http://markmail.org/message/hm3zocskkrspcygp
> 
> My tests show a very noticeable improvement with this "fix" on my
> Windows 7 x65 PC.

Alright. I've queued up the fix for stable after beating it up for a
bit.

Wanted: algorithm to replace util.chunkbuffer which uses < 2x memory for
very large reads. This may be beyond the capabilities of Python.

-- 
Mathematics is the supreme nostalgia of our time.