[PATCH RFC] revlog/changegroup: use callbacks for populating efiles

Matt Mackall mpm at selenic.com
Sat Jul 11 15:17:27 CDT 2015


On Sat, 2015-07-11 at 11:26 -0700, Gregory Szorc wrote:
> On Fri, Jul 10, 2015 at 5:03 PM, Matt Mackall <mpm at selenic.com> wrote:
> 
> > On Fri, 2015-07-10 at 16:46 -0700, Gregory Szorc wrote:
> > > # HG changeset patch
> > > # User Gregory Szorc <gregory.szorc at gmail.com>
> > > # Date 1436571961 25200
> > > #      Fri Jul 10 16:46:01 2015 -0700
> > > # Node ID bca84aa6ea80b386a01c245e7f4e4ebdb720cd8c
> > > # Parent  648323f41a89619d9eeeb7287213378c340866c8
> > > revlog/changegroup: use callbacks for populating efiles
> >
> > Probably needs to use the flush() thingy that the nearby censor code is
> > using.
> >
> 
> Calling flush() makes the error go away!
> 
> While my SSD equipped MBP didn't appear to exhibit a slowdown unbundling
> mozilla-central, flushing file descriptors after applying each changeset
> feels a bit excessive and it makes me uncomfortable due to potential
> performance implications. I just don't think an extra I/O related syscall
> should be required for this feature.

The flush is only a flush from the userspace file buffers to kernel-side
buffers, it won't actually force I/O. It'll still be lazily written by
the kernel's caching layer at roughly the same rate as without the
buffering because the userspace buffers are tiny (4k) relative a modern
system's disk cache. But yes, it is a syscall and syscalls aren't free.

> But you have a stronger grasp on these things than me. Is calling flush()
> after every changeset add acceptable? Keep in mind this will be the only
> behavior for addchangegroup(), regardless of whether progress bars are
> displayed.

Well, it can be conditional on there being a callback.

In the slightly bigger picture, we might want to make the revlog object
aware of the file handles it might need to flush so that rather than
explicitly flushing on the write side, the read side can flush only if
needed. Something like:

for f in self._flushonread:
    f.flush()

..with the appropriate try/finally around the writers.

In the even bigger still picture, we might want to make it optionally
possible for revlog to hold onto file handles for the object's lifetime
to reduce syscall count and filesystem lookup overhead. Historically we
don't do this because Windows.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list