[PATCH 1 of 2 V2] log: flush stdout/err for the first log entries

Sat Sep 26 03:03:25 CDT 2015

On Fri, 25 Sep 2015 13:08:09 -0500, Matt Mackall wrote:
> On Thu, 2015-09-24 at 16:11 -0700, Durham Goode wrote:
> > On 9/24/15 3:51 PM, Matt Mackall wrote:
> > > On Thu, 2015-09-24 at 13:02 -0700, Durham Goode wrote:
> > >> On 9/24/15 12:49 PM, Matt Mackall wrote:
> > >>> On Thu, 2015-09-24 at 12:20 -0700, Durham Goode wrote:
> > >>>> # HG changeset patch
> > >>>> # User Durham Goode <durham at fb.com>
> > >>>> # Date 1443118388 25200
> > >>>> #      Thu Sep 24 11:13:08 2015 -0700
> > >>>> # Node ID 7a9aaaadaed9887a956cb6a1e79be527e6f4a20f
> > >>>> # Parent  a0eff7ebc2aebb32cf4c8da276104102ff37d786
> > >>>> log: flush stdout/err for the first log entries
> > >>>>
> > >>>> There have been problems with the pager where we get no results until the python
> > >>>> buffer has been filled. Let's fix that by manually flushing the buffer for the
> > >>>> first 50 commits from log. 50 was chosen because it will likely fill the users
> > >>>> screen and doesn't introduce a significant overhead.
> > >>>>
> > >>>> The overhead of this is negiligble. I see no perf difference when running log on
> > >>>> 100,000 commits.
> > >>> Still confused as to why this isn't in the ui code.
> > >> This is in addition to the ui code.  I have a patch that adjusts the ui
> > >> code to flush every N seconds, but it causes some tests to fail (since
> > >> they didn't on the timing of the progress bar output) so I sent these
> > >> patches first.
> > >>
> > >> We still need these 2 patches though.  Imagine a scenario where you run
> > >> hg log on a directory that has only had one commit total (and it was
> > >> recent). Immediately one commit is printed into the buffer (before the
> > >> flush time limit is hit).  Then half an hour goes buy while we scan the
> > >> rest of the repository for another commit in that directory.  We end up
> > >> not showing the user that commit until the command is completely
> > >> finished.  Printing the first N units prevents this experience.
> > > Right. You can't make the timer thing really work without a thread or
> > > something. So given that's known to be somewhat broken and that this
> > > flushing problem isn't in any way log-specific, why don't we just put a
> > > counter and a flush in ui.write()?
> > >
> > > I just did this, saw a pretty repeatable 7.2->7.3s change for hg log on
> > > the hg repo:
> > >
> > > diff -r b80b2ee71a08 mercurial/ui.py
> > > --- a/mercurial/ui.py	Thu Sep 24 00:34:15 2015 -0700
> > > +++ b/mercurial/ui.py	Thu Sep 24 17:45:42 2015 -0500
> > > @@ -104,6 +104,7 @@
> > >           self._ucfg = config.config() # untrusted
> > >           self._trustusers = set()
> > >           self._trustgroups = set()
> > > +        self._outcount = 0
> > >           self.callhooks = True
> > >   
> > >           if src:
> > > @@ -617,6 +618,9 @@
> > >           else:
> > >               for a in args:
> > >                   self.fout.write(str(a))
> > > +            if self._outcount < 50:
> > > +                self.fout.flush()
> > > +                self._outcount += 1
> > >   
> > >       def write_err(self, *args, **opts):
> > >           self._progclear()
> > >
> > > ("hg log -k sullivan -T." is a pretty good behavior test here.)

So we want line-buffering for first 50 lines?

I've timed it with line-buffered stdout. It's slightly slower than fully-
buffered io, but perhaps we don't care it if we use the pager.

  % time env LANG=C HGRCPATH=/dev/null ./hg log --config extensions.pager= \
  --config pager.pager=cat --pager=always --config ui.formatted=True \
  -r0:10000 > /dev/null

  fully-buffered: 1.88s user 0.06s system 100% cpu 1.940 total
  line-buffered:  2.09s user 0.10s system 101% cpu 2.150 total

diff --git a/hgext/pager.py b/hgext/pager.py
--- a/hgext/pager.py
+++ b/hgext/pager.py
@@ -66,6 +66,14 @@ from mercurial.i18n import _
 testedwith = 'internal'
 
 def _pagersubprocess(ui, p):
+    # XXX hack to initialize buffering mode of stdout to the default of tty
+    # (i.e. IOLBF) before dup2()
+    sys.stdout.write('\r')
+    sys.stdout.flush()
+    # or call setvbuf() explicitly (needs this if stdout is /dev/null)
+    #import _chgutil
+    #_chgutil.setfilebufmode(sys.stdout, _chgutil.IOLBF)
+
     pager = subprocess.Popen(p, shell=True, bufsize=-1,
                              close_fds=util.closefds, stdin=subprocess.PIPE,
                              stdout=sys.stdout, stderr=sys.stderr)