[PATCH] util: improve iterfile so it impacts little on performance
Jun Wu
quark at fb.com
Tue Nov 15 12:43:00 EST 2016
Excerpts from Jun Wu's message of 2016-11-15 16:04:13 +0000:
> # HG changeset patch
> # User Jun Wu <quark at fb.com>
> # Date 1479225350 0
> # Tue Nov 15 15:55:50 2016 +0000
> # Node ID 3cd2e9873bc1d565300b629e72100800075d12bb
> # Parent d1a0a64f6e16432333bea0476098c46a61222b9b
> # Available At https://bitbucket.org/quark-zju/hg-draft
> # hg pull https://bitbucket.org/quark-zju/hg-draft -r 3cd2e9873bc1
> util: improve iterfile so it impacts little on performance
>
> We have performance concerns on "iterfile" as it is 4X slower on normal
> files. While modern systems have the nice property that reading a "fast"
> (on-disk) file cannot be interrupted and should be made use of.
>
> This patch dumps the related knowledge in comments. And tries to minimize
> the performance impact: it only use the slower but safer approach for
> non-normal files. It gives up for Python < 2.7.4 because the slower approach
I think I can fix Python < 2.7.4 using some slow code path as well.
So I'm dropping this one. A new version is coming.
> does not make a difference in terms of safety. And it avoids the workaround
> for Python >= 3 and PyPy who don't have the EINTR issue.
>
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -25,8 +25,10 @@ import hashlib
> import imp
> import os
> +import platform as pyplatform
> import re as remod
> import shutil
> import signal
> import socket
> +import stat
> import string
> import subprocess
> @@ -2191,8 +2193,31 @@ def wrap(line, width, initindent='', han
> return wrapper.fill(line).encode(encoding.encoding)
>
> -def iterfile(fp):
> - """like fp.__iter__ but does not have issues with EINTR. Python 2.7.12 is
> - known to have such issues."""
> - return iter(fp.readline, '')
> +if (pyplatform.python_implementation() == 'CPython' and
> + sys.version_info <= (3, 0) and sys.version_info >= (2, 7, 4)):
> + # There is an issue with CPython 2 that file.__iter__ does not handle EINTR
> + # correctly. CPython <= 2.7.12 is known to have the issue.
> + # In CPython >= 2.7.4, file.read, file.readline etc. deal with EINTR
> + # correctly so we can use the workaround below. However the workaround is
> + # about 4X slower than the native iterator because the latter does
> + # readahead caching in CPython layer.
> + # On modern systems like Linux, the "read" syscall cannot be interrupted
> + # for reading "fast" files like on-disk files. So the EINTR issue only
> + # affects things like pipes, sockets, ttys etc. We treat "normal" (S_ISREG)
> + # files approximately as "fast" files and use the fast (unsafe) code path.
> + def iterfile(fp):
> + fastpath = True
> + try:
> + fastpath = stat.S_ISREG(os.fstat(fp.fileno()).st_mode)
> + except (AttributeError, OSError): # no fileno, or stat fails
> + pass
> + if fastpath:
> + return fp
> + else:
> + return iter(fp.readline, '')
> +else:
> + # For CPython < 2.7.4, the workaround wouldn't make things better.
> + # PyPy and CPython 3 do not have the EINTR issue thus no workaround needed.
> + def iterfile(fp):
> + return fp
>
> def iterlines(iterator):
More information about the Mercurial-devel
mailing list