[PATCH 3 of 6] patch: migrate to util.iterfile

Jun Wu quark at fb.com
Tue Nov 15 10:46:18 EST 2016


Excerpts from Yuya Nishihara's message of 2016-11-16 00:39:46 +0900:
> On Tue, 15 Nov 2016 15:14:09 +0000, Jun Wu wrote:
> > Excerpts from Jun Wu's message of 2016-11-15 15:11:52 +0000:
> > > Excerpts from Pierre-Yves David's message of 2016-11-15 14:07:15 +0000:
> > > > 
> > > > On 11/15/2016 01:38 PM, Jun Wu wrote:
> > > > > Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900:
> > > > >> On Mon, 14 Nov 2016 23:35:55 +0000, Jun Wu wrote:
> > > > >>> # HG changeset patch
> > > > >>> # User Jun Wu <quark at fb.com>
> > > > >>> # Date 1479165246 0
> > > > >>> #      Mon Nov 14 23:14:06 2016 +0000
> > > > >>> # Node ID d1637df5ffd91e95da25213f06f346adb3269ace
> > > > >>> # Parent  8097b8c66952003addd5b683a14265c1d3cc201f
> > > > >>> # Available At https://bitbucket.org/quark-zju/hg-draft  
> > > > >>> #              hg pull https://bitbucket.org/quark-zju/hg-draft    -r d1637df5ffd9
> > > > >>> patch: migrate to util.iterfile
> > > > >>
> > > > >> Not all read()s are interruptible by signals. I don't remember the detail, but
> > > > >> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note that a
> > > > >> (local) disk is not a slow device according to this definition; I/O operations
> > > > >> on disk devices are not interrupted by signals."
> > > > >
> > > > > Good to know! I guess modern OS have similar decisions, although POSIX seems
> > > > > to be not helpful here.
> > > > >
> > > > >> Since readline() doesn't have cache in CPython layer, it would be slightly
> > > > >> slower than fp.__iter__().
> > > > >
> > > > > A quick benchmark shows it's ~4x slower.
> > > > >
> > > > > Maybe make the function smarter to only use the workaround for a pipe?
> > > > >
> > > > >   if hassafeattr(fp, 'fileno') and stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode):
> > > > >       return iter(fp.readline, '')
> 
> No idea if it can cover all cases. (e.g. i don't know if tty is the "slow"
> device.)
> 
> > > > Should we drop this series from hg-committed or a followup is enough?
> > > 
> > > +1. I guess I have to abandon the SIGCHILD patch, too (sadly). As long as we
> > > support Python 2.6, we are at risk that any fileobj.read() is broken.
> > 
> > Actually, I think the worker issue is still worthwhile to fix (the current
> > situation is better than none anyway). So I will send follow-ups:
> > 
> >   - Fix util.iterfile perf
> >   - Only use SIGCHILD handler in worker, if Python >= 2.7.4
> 
> It sounds good to patch worker.py which is known to have the problem, and we
> wouldn't care if the read of progress messages gets 4x slower.

Given the fact that you can do "hg import - < file". I think it's worthwhile
to patch all places in the code base. The patch I'm currently writing will
special case S_ISREG files so it should have little perf impact.


More information about the Mercurial-devel mailing list