[PATCH] util: improve iterfile so it impacts little on performance

Jun Wu quark at fb.com
Tue Nov 15 12:43:00 EST 2016


Excerpts from Jun Wu's message of 2016-11-15 16:04:13 +0000:
> # HG changeset patch
> # User Jun Wu <quark at fb.com>
> # Date 1479225350 0
> #      Tue Nov 15 15:55:50 2016 +0000
> # Node ID 3cd2e9873bc1d565300b629e72100800075d12bb
> # Parent  d1a0a64f6e16432333bea0476098c46a61222b9b
> # Available At https://bitbucket.org/quark-zju/hg-draft 
> #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 3cd2e9873bc1
> util: improve iterfile so it impacts little on performance
> 
> We have performance concerns on "iterfile" as it is 4X slower on normal
> files. While modern systems have the nice property that reading a "fast"
> (on-disk) file cannot be interrupted and should be made use of.
> 
> This patch dumps the related knowledge in comments. And tries to minimize
> the performance impact: it only use the slower but safer approach for
> non-normal files. It gives up for Python < 2.7.4 because the slower approach

I think I can fix Python < 2.7.4 using some slow code path as well.
So I'm dropping this one. A new version is coming.

> does not make a difference in terms of safety. And it avoids the workaround
> for Python >= 3 and PyPy who don't have the EINTR issue.
> 
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -25,8 +25,10 @@ import hashlib
>  import imp
>  import os
> +import platform as pyplatform
>  import re as remod
>  import shutil
>  import signal
>  import socket
> +import stat
>  import string
>  import subprocess
> @@ -2191,8 +2193,31 @@ def wrap(line, width, initindent='', han
>      return wrapper.fill(line).encode(encoding.encoding)
>  
> -def iterfile(fp):
> -    """like fp.__iter__ but does not have issues with EINTR. Python 2.7.12 is
> -    known to have such issues."""
> -    return iter(fp.readline, '')
> +if (pyplatform.python_implementation() == 'CPython' and
> +    sys.version_info <= (3, 0) and sys.version_info >= (2, 7, 4)):
> +    # There is an issue with CPython 2 that file.__iter__ does not handle EINTR
> +    # correctly. CPython <= 2.7.12 is known to have the issue.
> +    # In CPython >= 2.7.4, file.read, file.readline etc. deal with EINTR
> +    # correctly so we can use the workaround below. However the workaround is
> +    # about 4X slower than the native iterator because the latter does
> +    # readahead caching in CPython layer.
> +    # On modern systems like Linux, the "read" syscall cannot be interrupted
> +    # for reading "fast" files like on-disk files. So the EINTR issue only
> +    # affects things like pipes, sockets, ttys etc. We treat "normal" (S_ISREG)
> +    # files approximately as "fast" files and use the fast (unsafe) code path.
> +    def iterfile(fp):
> +        fastpath = True
> +        try:
> +            fastpath = stat.S_ISREG(os.fstat(fp.fileno()).st_mode)
> +        except (AttributeError, OSError): # no fileno, or stat fails
> +            pass
> +        if fastpath:
> +            return fp
> +        else:
> +            return iter(fp.readline, '')
> +else:
> +    # For CPython < 2.7.4, the workaround wouldn't make things better.
> +    # PyPy and CPython 3 do not have the EINTR issue thus no workaround needed.
> +    def iterfile(fp):
> +        return fp
>  
>  def iterlines(iterator):


More information about the Mercurial-devel mailing list