[PATCH 2 of 4] obsstore: disable garbage collection during initialisation (issue4456)

Matt Mackall mpm at selenic.com
Fri Dec 5 14:50:02 CST 2014


On Thu, 2014-12-04 at 12:49 +0100, Antoine Pitrou wrote:
> On Wed, 03 Dec 2014 20:00:23 -0600
> Matt Mackall <mpm at selenic.com> wrote:
> > On Thu, 2014-12-04 at 01:44 +0100, Antoine Pitrou wrote:
> > > On Wed, 03 Dec 2014 17:31:25 -0600
> > > Matt Mackall <mpm at selenic.com> wrote:
> > > > On Sun, 2014-11-30 at 05:17 -0800, Pierre-Yves David wrote:
> > > > > 
> > > > > On 11/30/2014 05:06 AM, Antoine Pitrou wrote:
> > > > > > On Sat, 29 Nov 2014 17:57:42 -0800
> > > > > > Pierre-Yves David <pierre-yves.david at ens-lyon.org> wrote:
> > > > > >> # HG changeset patch
> > > > > >> # User Pierre-Yves David <pierre-yves.david at fb.com>
> > > > > >> # Date 1417049911 28800
> > > > > >> #      Wed Nov 26 16:58:31 2014 -0800
> > > > > >> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a
> > > > > >> # Parent  04eb7e49d2b6f90f71aa85de9ad0b4d70670d688
> > > > > >> obsstore: disable garbage collection during initialisation (issue4456)
> > > > > >>
> > > > > >> Python garbage collection is triggered by contained creation. So code that
> > > > > >> creates a lot of tuple tends to trigger GC a lot. We disable the gc during
> > > > > >> obsolescence marker parsing and associated initialization. The provide and
> > > > > >> interesting speedup (25%).
> > > > > >>
> > > > > >> On my 58758 markers repo:
> > > > > >> before: 0.468247 seconds
> > > > > >> after:  0.344362 seconds
> > > > > >
> > > > > > Which Python version is that?
> > > > > 
> > > > > Python 2.7.8 (default, Oct 18 2014, 12:50:18)
> > > > > [GCC 4.9.1]
> > > > 
> > > > Python's GC behavior while building large containers is quite
> > > > questionable (aka quadratic).
> > > 
> > > It shouldn't, see https://hg.python.org/cpython/rev/79276316b94b/
> > > 
> > > If you have a reproducer, please open an issue at bugs.python.org.
> > 
> > You're right, the quadratic behavior appears to be gone in 2.7. In 2.7,
> > it's down to O(n): generic construction of lists of tuples gets about 4x
> > slower. So it's still definitely worth disabling for any potentially
> > large list or dict we might build. And obsolete markers are in that
> > category.
> 
> Thanks. I'm wondering if there would be a way to improve our current
> heuristics to alleviate such issues.

Maybe adapt the GC threshold proportional to the number of active
objects seen at the last GC. "You've got a million objects already? I
shouldn't make a fuss when you create another 100000 objects.."

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list