[PATCH 2 of 4] obsstore: disable garbage collection during initialisation (issue4456)

Siddharth Agarwal sid at less-broken.com
Mon Dec 1 18:35:30 CST 2014


On 11/29/2014 05:57 PM, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at fb.com>
> # Date 1417049911 28800
> #      Wed Nov 26 16:58:31 2014 -0800
> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a
> # Parent  04eb7e49d2b6f90f71aa85de9ad0b4d70670d688
> obsstore: disable garbage collection during initialisation (issue4456)

We also disable the GC while loading the dirstate, for the same reason. 
Can we factor the disable-GC logic out, into a decorator perhaps? (I did 
the same thing for hgsubversion recently for a ~3x perf win on a large 
repo, and I suspect we're going to have to do this more soon.)


>
> Python garbage collection is triggered by contained

"container"

>   creation. So code that
> creates a lot of tuple tends to trigger GC a lot. We disable the gc during
> obsolescence marker parsing and associated initialization. The provide and

"This provides an"

> interesting speedup (25%).
>
> On my 58758 markers repo:
> before: 0.468247 seconds
> after:  0.344362 seconds
>
> Thanks goes to Siddharth Agarwal for the lead.
>
> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
> --- a/mercurial/obsolete.py
> +++ b/mercurial/obsolete.py
> @@ -66,10 +66,11 @@ The file starts with a version header:
>   The header is followed by the markers. Marker format depend of the version. See
>   comment associated with each format for details.
>   
>   """
>   import struct
> +import gc
>   import util, base85, node
>   import phases
>   from i18n import _
>   
>   _pack = struct.pack
> @@ -466,12 +467,28 @@ class obsstore(object):
>           self.sopener = sopener
>           data = sopener.tryread('obsstore')
>           self._version = defaultformat
>           self._readonly = readonly
>           if data:
> -            self._version, markers = _readmarkers(data)
> -            self._load(markers)
> +            # Python's garbage collector triggers a GC each time a certain
> +            # number of container objects (the number being defined by
> +            # gc.get_threshold()) are allocated. Markers parsing creates
> +            # multiple tuples while parsing each markers so the gc is triggered
> +            # a lot while parsing an high number of markers. As a workaround,
> +            # disable GC during initialisation.
> +            #
> +            # This would probably marker parsing during exchange but I do not
> +            # expect the order of magnitude to matter outside of initialisation
> +            # case.

I'm unable to parse this last sentence.

> +            gcenabled = gc.isenabled()
> +            gc.disable()
> +            try:
> +                self._version, markers = _readmarkers(data)
> +                self._load(markers)
> +            finally:
> +                if gcenabled:
> +                    gc.enable()
>   
>       def __iter__(self):
>           return iter(self._all)
>   
>       def __len__(self):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel



More information about the Mercurial-devel mailing list