cvs2hg memory use

Greg Ward greg-hg at gerg.ca
Tue Aug 4 08:51:49 CDT 2009


[me, yesterday]
> As expected, a gc.collect() after every commit keeps memory use quite
> constant.  A more reasonable approach, countermanding cvs2svn's
> default with a call to gc.enable() at the begninning of the
> conversion, works almost as well but presumably with less of a
> performance hit. (But the memory profile in this case is still
> noticeably worse than under 1.1 or 1.2 without GC: DRS creeps as high
> as 50 MB after 4000 commits, versus 33 MB under Hg 1.1.)
[...]
> Still curious what changed in 1.3.  I might write a little test script
> and bisect this anyways, just for the heck of it.

At Dirkjan's urging, I wrote a test script called commitloop (I'll
paste it below).  There's no apparent leak or reference cycle with
commitloop.  I only see leaky behaviour inside cvs2hg, under Mercurial
1.3, with GC disabled.

I tried running commitloop with Mercurial 1.1.2, 1.2.1, and 1.3.1 --
same each time.  Steady-state memory usage might be a smidge higher
under 1.3, but the profile is the same: nice and flat, no leak.

In my test script, I tried detecting cycles as follows: with GC
disabled, call gc.collect() after every commit and prints its return
value (which is "number of unreachable objects" according to the
docs).  Nothing -- almost.  After the first commit, it reports 9
objects; every subsequent commit reports 0 unreachable.

So there must be something about cvs2hg that causes a ref cycle, but
only with Mercurial 1.3.  Odd.

Anyways... here is my commitloop script:

----------------------------------------------------------------------------
#!/usr/bin/env python

'''Create a repository and repeatedly commit to it.  Designed to
simulate what conversion tools do in order to evaluate performance and
memory usage.'''

import sys
import os
import gc
import optparse

#sys.path.insert(0, os.path.dirname(os.path.dirname(sys.argv[0])))
from mercurial import ui, hg, node, context, __version__ as hgversion

def main():
    parser = optparse.OptionParser(usage="%prog outrepo",
                                   description=__doc__)
    (options, args) = parser.parse_args()
    if not args:
        parser.error("not enough arguments")

    print "Mercurial version: %s" % hgversion.version
    gc.disable()

    repodir = args[0]
    create = not (os.path.isdir(os.path.join(repodir, '.hg')))
    repo = hg.repository(ui.ui(), repodir, create=create)

    random = open("/dev/urandom", "rb")
    def getfilectx(repo, mctx, path):
        return context.memfilectx(
            path, random.read(1024*100), False, False, False)

    previous = None
    files = ["foo"]
    idx = len(repo)
    lock = repo.lock()
    try:
        while True:
            mctx = context.memctx(
                repo, (previous, -1), 'rev %d' % idx, files, getfilectx)

            tr = repo.transaction()
            try:
                previous = repo.commitctx(mctx)
                tr.close()
            finally:
                del tr
            print "committed %d:%s" % (idx, node.short(previous))
            idx += 1

            #nobj = gc.collect()
            #print "GC collected %d unreachable objects" % nobj
    finally:
        lock.release()

try:
    main()
except KeyboardInterrupt:
    sys.exit("interrupted")
----------------------------------------------------------------------------


More information about the Mercurial-devel mailing list