[PATCH 4 of 5] changegroup: use filecloser for closing filelog handles

Gregory Szorc gregory.szorc at gmail.com
Wed Sep 30 11:09:39 CDT 2015


On Wed, Sep 30, 2015 at 8:50 AM, Augie Fackler <raf at durin42.com> wrote:

> On Tue, Sep 29, 2015 at 10:36:30PM -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1443552606 25200
> > #      Tue Sep 29 11:50:06 2015 -0700
> > # Node ID a8d8f0d70593aaa6fb2b86dc77d29c021d83277d
> > # Parent  4ade2d6126d9b0f67ea898ab0d021abb88d157dd
> > changegroup: use filecloser for closing filelog handles
>
> Overall, I like where this is headed, but I do wonder if this should
> be part of our vfs layer rather than its own thing - that would also
> give us a way to ensure close() operations finish before a file is
> subsequently reopened (eg if an inline revlog switches to two-file
> during a pull). What do you think?
>

The code purposefully only applies the delayed closing behavior on the
final handles used by revlog.addgroup(), so this won't impact inline to 2
file switch. (Future optimization: revlogs advertise their size inside the
changegroup so we don't waste time doing the 2 file switch.)

However, the code does assume that a revlog is only listed once per
changegroup/bundle. If that's ever not true, we do have a race between
closing and writing.

Moving to the VFS layer is tempting. While I was implementing this, I was
thinking that this work could be the beginning of a generic asynchronous
and multi-threaded I/O facility. That would almost certainly be in the VFS
layer.

Adding to the VFS layer does mean a little extra state tracking. There are
also questions such as "how do we guarantee these open files are closed [so
we can finalize the transition]?"

Let me think about it some more.


>
> >
> > On a large repository like mozilla-central, writing filelogs during
> > changegroup application during a clone opens (and closes) over 200,000
> > files. On Windows, file closes are slow and block the main thread while
> > they are closing.
> >
> > Moving filelog file closes to a background thread on Windows frees up
> > the main thread to process the next filelog before waiting for the close
> > to complete. The end result is faster filelog application.
> >
> > On my Windows 10 machine (not a VM) and with a modern SSD, this patch
> > has a drastic impact on unbundling a snapshot of mozilla-central:
> >
> > Before: ~910s
> > After:  ~609s
> >
> > That's a 301s or 5:01 wall time reduction!
> >
> > And I'm convinced this isn't the upper limit of Windows performance: I
> > made some local changes to measure time spent waiting for the file
> > closer thread's queue to unblock and it said we still spend ~34s of main
> > thread time waiting on file closes. I also haven't really experimented
> > with the various wait intervals and limits in the file closer patch. But
> > since we're already looking at a 5 minute performance win, I'm content
> > with where things are.
> >
> > diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> > --- a/mercurial/changegroup.py
> > +++ b/mercurial/changegroup.py
> > @@ -657,12 +657,14 @@ def changegroup(repo, basenodes, source)
> >      # to avoid a race we use changegroupsubset() (issue1320)
> >      return changegroupsubset(repo, basenodes, repo.heads(), source)
> >
> >  def addchangegroupfiles(repo, source, revmap, trp, pr, needfiles):
> > -    if True:
> > +    closer = util.filecloser(repo.ui)
> > +    try:
> >          revisions = 0
> >          files = 0
> >          while True:
> > +            closer.verifystate()
> >              chunkdata = source.filelogheader()
> >              if not chunkdata:
> >                  break
> >              f = chunkdata["filename"]
> > @@ -670,9 +672,9 @@ def addchangegroupfiles(repo, source, re
> >              pr()
> >              fl = repo.file(f)
> >              o = len(fl)
> >              try:
> > -                if not fl.addgroup(source, revmap, trp):
> > +                if not fl.addgroup(source, revmap, trp,
> closecb=closer.close):
> >                      raise util.Abort(_("received file revlog group is
> empty"))
> >              except error.CensoredBaseError as e:
> >                  raise util.Abort(_("received delta base is censored:
> %s") % e)
> >              revisions += len(fl) - o
> > @@ -687,8 +689,10 @@ def addchangegroupfiles(repo, source, re
> >                          raise util.Abort(
> >                              _("received spurious file revlog entry"))
> >                  if not needs:
> >                      del needfiles[f]
> > +    finally:
> > +        closer.cleanup()
> >          repo.ui.progress(_('files'), None)
> >
> >      for f, needs in needfiles.iteritems():
> >          fl = repo.file(f)
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at selenic.com
> > https://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150930/bc80b05f/attachment.html>


More information about the Mercurial-devel mailing list