[PATCH] util: increase the copy buffer size in shutil's copyfile

Tony Tung tonytung at instagram.com
Fri May 6 17:11:02 EDT 2016


> On May 6, 2016, at 1:50 PM, Augie Fackler <raf at durin42.com> wrote:
> 
> On Fri, May 06, 2016 at 01:36:41PM -0700, Tony Tung wrote:
>> # HG changeset patch
>> # User Tony Tung <ttung at fb.com>
>> # Date 1462566734 25200
>> #      Fri May 06 13:32:14 2016 -0700
>> # Branch stable
>> # Node ID edee9b1c3a2849fcb11900959ad785b202608c4f
>> # Parent  97811ff7964710d32cae951df1da8019b46151a2
>> util: increase the copy buffer size in shutil's copyfile
>> 
>> The default copy buffer size in shutil is 16KB.  This is far too
>> conservative for modern systems.  Use a 4MB buffer instead to copy files
>> more efficiently.
>> 
>> This manifests prominently with journaling large dirstates, which can be
>> multi-second operations.
>> 
>> diff --git a/mercurial/util.py b/mercurial/util.py
>> --- a/mercurial/util.py
>> +++ b/mercurial/util.py
>> @@ -20,6 +20,7 @@
>> import collections
>> import datetime
>> import errno
>> +import functools
>> import gc
>> import hashlib
>> import imp
>> @@ -1010,6 +1011,23 @@
>> 
>>     return check
>> 
>> +# The default copy buffer in python's shutil is 16KB.  This is far too
>> +# conservative in modern systems.  Copy more at a time so we don't incur so much
>> +# system call overhead.
>> +def wrap(container, funcname, newfunc):
>> +    origfunc = getattr(container, funcname)
>> +    wrap = functools.partial(newfunc, origfunc)
>> +    functools.update_wrapper(wrap, origfunc)
>> +    setattr(container, funcname, wrap)
>> +
>> +def copyfileobj_lgbuffer(orig, *args, **kwargs):
>> +    if len(args) == 2 and 'length' not in kwargs:
>> +        args = (args[0], args[1], (4 * 1024 * 1024))
>> +
>> +    orig(*args, **kwargs)
>> +
>> +wrap(shutil, 'copyfileobj', copyfileobj_lgbuffer)
> 
> Why are you monkeypatching shutil's implementation here? I'm happy to
> have a helper method in util, but I'm deeply skeptical of
> monkeypatching the stdlib, doubly so when it's a method we don't
> currently call outside of a test.

copyfileobj is called by copyfile.  copyfile does not expose the length parameter.  Either we copypasta copyfile from shutil or we monkey patch copyfileobj’s default.  I’m not particularly wedded to either approach as each has its own downside.

Is the helper method you’re suggesting the copypaste approach?

Thanks,
Tony


More information about the Mercurial-devel mailing list