[PATCH 1 of 2 V2] log: flush stdout/err for the first log entries

Sat Sep 26 17:45:45 CDT 2015

On 09/26/2015 01:03 AM, Yuya Nishihara wrote:
> On Fri, 25 Sep 2015 13:08:09 -0500, Matt Mackall wrote:
>> On Thu, 2015-09-24 at 16:11 -0700, Durham Goode wrote:
>>> On 9/24/15 3:51 PM, Matt Mackall wrote:
>>>> On Thu, 2015-09-24 at 13:02 -0700, Durham Goode wrote:
>>>>> On 9/24/15 12:49 PM, Matt Mackall wrote:
>>>>>> On Thu, 2015-09-24 at 12:20 -0700, Durham Goode wrote:
>>>>>>> # HG changeset patch
>>>>>>> # User Durham Goode <durham at fb.com>
>>>>>>> # Date 1443118388 25200
>>>>>>> #      Thu Sep 24 11:13:08 2015 -0700
>>>>>>> # Node ID 7a9aaaadaed9887a956cb6a1e79be527e6f4a20f
>>>>>>> # Parent  a0eff7ebc2aebb32cf4c8da276104102ff37d786
>>>>>>> log: flush stdout/err for the first log entries
>>>>>>>
>>>>>>> There have been problems with the pager where we get no results until the python
>>>>>>> buffer has been filled. Let's fix that by manually flushing the buffer for the
>>>>>>> first 50 commits from log. 50 was chosen because it will likely fill the users
>>>>>>> screen and doesn't introduce a significant overhead.
>>>>>>>
>>>>>>> The overhead of this is negiligble. I see no perf difference when running log on
>>>>>>> 100,000 commits.
>>>>>> Still confused as to why this isn't in the ui code.
>>>>> This is in addition to the ui code.  I have a patch that adjusts the ui
>>>>> code to flush every N seconds, but it causes some tests to fail (since
>>>>> they didn't on the timing of the progress bar output) so I sent these
>>>>> patches first.
>>>>>
>>>>> We still need these 2 patches though.  Imagine a scenario where you run
>>>>> hg log on a directory that has only had one commit total (and it was
>>>>> recent). Immediately one commit is printed into the buffer (before the
>>>>> flush time limit is hit).  Then half an hour goes buy while we scan the
>>>>> rest of the repository for another commit in that directory.  We end up
>>>>> not showing the user that commit until the command is completely
>>>>> finished.  Printing the first N units prevents this experience.
>>>> Right. You can't make the timer thing really work without a thread or
>>>> something. So given that's known to be somewhat broken and that this
>>>> flushing problem isn't in any way log-specific, why don't we just put a
>>>> counter and a flush in ui.write()?
>>>>
>>>> I just did this, saw a pretty repeatable 7.2->7.3s change for hg log on
>>>> the hg repo:
>>>>
>>>> diff -r b80b2ee71a08 mercurial/ui.py
>>>> --- a/mercurial/ui.py	Thu Sep 24 00:34:15 2015 -0700
>>>> +++ b/mercurial/ui.py	Thu Sep 24 17:45:42 2015 -0500
>>>> @@ -104,6 +104,7 @@
>>>>            self._ucfg = config.config() # untrusted
>>>>            self._trustusers = set()
>>>>            self._trustgroups = set()
>>>> +        self._outcount = 0
>>>>            self.callhooks = True
>>>>
>>>>            if src:
>>>> @@ -617,6 +618,9 @@
>>>>            else:
>>>>                for a in args:
>>>>                    self.fout.write(str(a))
>>>> +            if self._outcount < 50:
>>>> +                self.fout.flush()
>>>> +                self._outcount += 1
>>>>
>>>>        def write_err(self, *args, **opts):
>>>>            self._progclear()
>>>>
>>>> ("hg log -k sullivan -T." is a pretty good behavior test here.)
>
> So we want line-buffering for first 50 lines?
>
> I've timed it with line-buffered stdout. It's slightly slower than fully-
> buffered io, but perhaps we don't care it if we use the pager.

Look slike you got your hand on the way to control the buffer setting. \o/

We should definitely goes for line (or probably even none) buffering for 
the pager pipe (the same as what we do for no-pager).

-- 
Pierre-Yves David