[PATCH 2 of 4] obsstore: disable garbage collection during initialisation (issue4456)

Pierre-Yves David pierre-yves.david at ens-lyon.org
Sun Nov 30 01:46:25 CST 2014



On 11/29/2014 10:29 PM, Martin von Zweigbergk wrote:
>
>
> On Sat Nov 29 2014 at 5:58:52 PM Pierre-Yves David
> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> wrote:
>
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david at fb.com
>     <mailto:pierre-yves.david at fb.com>>
>     # Date 1417049911 28800
>     #      Wed Nov 26 16:58:31 2014 -0800
>     # Node ID d0f3dac4ea2b4aff51946c7db0834a__a4e5c3e82a
>     # Parent  04eb7e49d2b6f90f71aa85de9ad0b4__d70670d688
>     obsstore: disable garbage collection during initialisation (issue4456)
>
>     Python garbage collection is triggered by contained creation. So
>     code that
>     creates a lot of tuple tends to trigger GC a lot. We disable the gc
>     during
>     obsolescence marker parsing and associated initialization.
>
>
> On your repo, do you know the peak memory usage before and after? My gut
> feeling is that the amount of garbage that's now not collected until
> later is nothing to be concerned about, but it would be nice to confirm
> that.

no relevant peak detected.

>
>     The provide and
>     interesting speedup (25%).
>
>     On my 58758 markers repo:
>     before: 0.468247 seconds
>     after:  0.344362 seconds
>
>
> Do you see similar values from 'time hg ...'? Without trying to
> understand the patch, it sounds like you could just be moving the GC out
> of the method that's being timed. I just want to make sure it's an
> overall gain.

I do see similar gain, the overall time is much more unstable, but I get 
about the same benefit 0.05s

>
>     Thanks goes to Siddharth Agarwal for the lead.
>
>     diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
>     --- a/mercurial/obsolete.py
>     +++ b/mercurial/obsolete.py
>     @@ -66,10 +66,11 @@ The file starts with a version header:
>       The header is followed by the markers. Marker format depend of the
>     version. See
>       comment associated with each format for details.
>
>       """
>       import struct
>     +import gc
>       import util, base85, node
>       import phases
>       from i18n import _
>
>       _pack = struct.pack
>     @@ -466,12 +467,28 @@ class obsstore(object):
>               self.sopener = sopener
>               data = sopener.tryread('obsstore')
>               self._version = defaultformat
>               self._readonly = readonly
>               if data:
>     -            self._version, markers = _readmarkers(data)
>     -            self._load(markers)
>     +            # Python's garbage collector triggers a GC each time a
>     certain
>     +            # number of container objects (the number being defined by
>     +            # gc.get_threshold()) are allocated. Markers parsing
>     creates
>     +            # multiple tuples while parsing each markers so the gc
>     is triggered
>     +            # a lot while parsing an high number of markers. As a
>     workaround,
>     +            # disable GC during initialisation.
>
>
> What was gc.get_threshold() in your case? Do you know how much memory
> was allocated? How many times GC was forced before this patch?

Nope. I did not digged that far. The whole thing should be reimplemented 
in C anyway.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list