[PATCH 2 of 4] obsstore: disable garbage collection during initialisation (issue4456)

Martin von Zweigbergk martinvonz at google.com
Sun Nov 30 00:29:48 CST 2014


On Sat Nov 29 2014 at 5:58:52 PM Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at fb.com>
> # Date 1417049911 28800
> #      Wed Nov 26 16:58:31 2014 -0800
> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a
> # Parent  04eb7e49d2b6f90f71aa85de9ad0b4d70670d688
> obsstore: disable garbage collection during initialisation (issue4456)
>
> Python garbage collection is triggered by contained creation. So code that
> creates a lot of tuple tends to trigger GC a lot. We disable the gc during
> obsolescence marker parsing and associated initialization.


On your repo, do you know the peak memory usage before and after? My gut
feeling is that the amount of garbage that's now not collected until later
is nothing to be concerned about, but it would be nice to confirm that.


> The provide and
> interesting speedup (25%).
>
> On my 58758 markers repo:
> before: 0.468247 seconds
> after:  0.344362 seconds
>

Do you see similar values from 'time hg ...'? Without trying to understand
the patch, it sounds like you could just be moving the GC out of the method
that's being timed. I just want to make sure it's an overall gain.


> Thanks goes to Siddharth Agarwal for the lead.
>
> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
> --- a/mercurial/obsolete.py
> +++ b/mercurial/obsolete.py
> @@ -66,10 +66,11 @@ The file starts with a version header:
>  The header is followed by the markers. Marker format depend of the
> version. See
>  comment associated with each format for details.
>
>  """
>  import struct
> +import gc
>  import util, base85, node
>  import phases
>  from i18n import _
>
>  _pack = struct.pack
> @@ -466,12 +467,28 @@ class obsstore(object):
>          self.sopener = sopener
>          data = sopener.tryread('obsstore')
>          self._version = defaultformat
>          self._readonly = readonly
>          if data:
> -            self._version, markers = _readmarkers(data)
> -            self._load(markers)
> +            # Python's garbage collector triggers a GC each time a certain
> +            # number of container objects (the number being defined by
> +            # gc.get_threshold()) are allocated. Markers parsing creates
> +            # multiple tuples while parsing each markers so the gc is
> triggered
> +            # a lot while parsing an high number of markers. As a
> workaround,
> +            # disable GC during initialisation.
>

What was gc.get_threshold() in your case? Do you know how much memory was
allocated? How many times GC was forced before this patch?


> +            #
> +            # This would probably marker parsing during exchange but I do
> not
> +            # expect the order of magnitude to matter outside of
> initialisation
> +            # case.
> +            gcenabled = gc.isenabled()
> +            gc.disable()
> +            try:
> +                self._version, markers = _readmarkers(data)
> +                self._load(markers)
> +            finally:
> +                if gcenabled:
> +                    gc.enable()
>
>      def __iter__(self):
>          return iter(self._all)
>
>      def __len__(self):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20141130/561f5964/attachment.html>


More information about the Mercurial-devel mailing list