[PATCH 3 of 3 RFC] streamclone: use backgroundfilecloser (issue4889)

Gregory Szorc gregory.szorc at gmail.com
Sun Jan 3 11:41:39 CST 2016


On Sun, Jan 3, 2016 at 2:35 AM, Adrian Buehlmann <adrian at cadifra.com> wrote:

> On 2016-01-03 04:15, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1451780015 28800
> > #      Sat Jan 02 16:13:35 2016 -0800
> > # Node ID f54785fd5e944ff0b58d8e807b28497b262a1530
> > # Parent  59042fecfa95376b35b22c97b8b71d3a02d26832
> > streamclone: use backgroundfilecloser (issue4889)
> >
> > Closing files that have been appended to is slow on Windows/NTFS.
> > CloseHandle() calls on this platform often take 1-10ms - and that's
> > on my i7-6700K Skylake processor with a modern and fast SSD. Contrast
> > with other I/O operations, such as writing data, which take <100us.
> >
> > This means that creating/appending thousands of files can add
> > significant overhead. For example, cloning mozilla-central creates
> > ~232,000 files revlog files. Assuming 1ms per CloseHandle(), that
> > yields 232s (3:52) of wall time waiting for file closes!
> >
> > The impact of this overhead can be measured most directly when applying
> > stream clone bundles. Applying these files is effectively uncompressing
> > a tar archive.
> >
> > Using a RAM disk (read: no I/O wait), the difference in wall time for a
> > `hg debugapplystreamclonebundle` for a ~1731 MB mozilla-central bundle
> > between Windows and Linux from the same machine is drastic:
> >
> > Linux:    ~12.8s (128MB/s)
> > Windows: ~352.0s (4.7MB/s)
> >
> > Windows is ~27.5x slower. Yikes!
> >
> > After this patch:
> >
> > Linux:    ~12.8s (128MB/s)
> > Windows: ~102.1s (16.1MB/s)
> >
> > Windows is now ~3.4x faster. Unfortunately, it is still ~8x slower than
> > Linux. Profiling reveals a few hot code paths that could likely be
> > improved. But that's for another patch.
> >
> > diff --git a/mercurial/streamclone.py b/mercurial/streamclone.py
> > --- a/mercurial/streamclone.py
> > +++ b/mercurial/streamclone.py
> > @@ -9,16 +9,17 @@ from __future__ import absolute_import
> >
> >  import struct
> >  import time
> >
> >  from .i18n import _
> >  from . import (
> >      branchmap,
> >      error,
> > +    scmutil,
> >      store,
> >      util,
> >  )
> >
> >  def canperformstreamclone(pullop, bailifbundle2supported=False):
> >      """Whether it is possible to perform a streaming clone as part of
> pull.
> >
> >      ``bailifbundle2supported`` will cause the function to return False
> if
> > @@ -301,31 +302,32 @@ def consumev1(repo, fp, filecount, bytec
> >          repo.ui.status(_('%d files to transfer, %s of data\n') %
> >                         (filecount, util.bytecount(bytecount)))
> >          handled_bytes = 0
> >          repo.ui.progress(_('clone'), 0, total=bytecount)
> >          start = time.time()
> >
> >          tr = repo.transaction(_('clone'))
> >          try:
> > -            if True:
> > +            with scmutil.backgroundfilecloser() as bfc:
> >                  for i in xrange(filecount):
> >                      # XXX doesn't support '\n' or '\r' in filenames
> >                      l = fp.readline()
> >                      try:
> >                          name, size = l.split('\0', 1)
> >                          size = int(size)
> >                      except (ValueError, TypeError):
> >                          raise error.ResponseError(
> >                              _('unexpected response from remote
> server:'), l)
> >                      if repo.ui.debugflag:
> >                          repo.ui.debug('adding %s (%s)\n' %
> >                                        (name, util.bytecount(size)))
> >                      # for backwards compat, name was partially encoded
> > -                    with repo.svfs(store.decodedir(name), 'w') as ofp:
> > +                    path = store.decodedir(name)
> > +                    with repo.svfs(path, 'w', filecloser=bfc) as ofp:
> >                          for chunk in util.filechunkiter(fp, limit=size):
> >                              handled_bytes += len(chunk)
> >                              repo.ui.progress(_('clone'), handled_bytes,
> >                                               total=bytecount)
> >                              ofp.write(chunk)
> >              tr.close()
> >          finally:
> >              tr.release()
>
> Does this mean you are calling tr.release() (..and thus releasing the
> write lock..) while there may still be files in open-for-writing-state?
>
>
>
backgroundfilecloser.__exit__ waits for all queued file handles to be
closed. Since that is called before tr.close() and tr.release(), there
shouldn't be any open files.

While I'm here, the series is still RFC because it at least needs some
comments and possibly assertions about appropriate usage. For example, if
you attempt to open the same path with a background closer, there is a race
condition between the initial handle closing and the 2nd being opened. The
initial close could result in a flush which could result in the 2nd open
not seeing all data. We probably also want to guard against using the
filecloser outside a context manager. And we want the thread pool size to
be configurable. Lots of small improvements.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20160103/dfe31e07/attachment.html>


More information about the Mercurial-devel mailing list