cvs2hg memory use
Greg Ward
greg-hg at gerg.ca
Tue Aug 4 08:51:49 CDT 2009
[me, yesterday]
> As expected, a gc.collect() after every commit keeps memory use quite
> constant. A more reasonable approach, countermanding cvs2svn's
> default with a call to gc.enable() at the begninning of the
> conversion, works almost as well but presumably with less of a
> performance hit. (But the memory profile in this case is still
> noticeably worse than under 1.1 or 1.2 without GC: DRS creeps as high
> as 50 MB after 4000 commits, versus 33 MB under Hg 1.1.)
[...]
> Still curious what changed in 1.3. I might write a little test script
> and bisect this anyways, just for the heck of it.
At Dirkjan's urging, I wrote a test script called commitloop (I'll
paste it below). There's no apparent leak or reference cycle with
commitloop. I only see leaky behaviour inside cvs2hg, under Mercurial
1.3, with GC disabled.
I tried running commitloop with Mercurial 1.1.2, 1.2.1, and 1.3.1 --
same each time. Steady-state memory usage might be a smidge higher
under 1.3, but the profile is the same: nice and flat, no leak.
In my test script, I tried detecting cycles as follows: with GC
disabled, call gc.collect() after every commit and prints its return
value (which is "number of unreachable objects" according to the
docs). Nothing -- almost. After the first commit, it reports 9
objects; every subsequent commit reports 0 unreachable.
So there must be something about cvs2hg that causes a ref cycle, but
only with Mercurial 1.3. Odd.
Anyways... here is my commitloop script:
----------------------------------------------------------------------------
#!/usr/bin/env python
'''Create a repository and repeatedly commit to it. Designed to
simulate what conversion tools do in order to evaluate performance and
memory usage.'''
import sys
import os
import gc
import optparse
#sys.path.insert(0, os.path.dirname(os.path.dirname(sys.argv[0])))
from mercurial import ui, hg, node, context, __version__ as hgversion
def main():
parser = optparse.OptionParser(usage="%prog outrepo",
description=__doc__)
(options, args) = parser.parse_args()
if not args:
parser.error("not enough arguments")
print "Mercurial version: %s" % hgversion.version
gc.disable()
repodir = args[0]
create = not (os.path.isdir(os.path.join(repodir, '.hg')))
repo = hg.repository(ui.ui(), repodir, create=create)
random = open("/dev/urandom", "rb")
def getfilectx(repo, mctx, path):
return context.memfilectx(
path, random.read(1024*100), False, False, False)
previous = None
files = ["foo"]
idx = len(repo)
lock = repo.lock()
try:
while True:
mctx = context.memctx(
repo, (previous, -1), 'rev %d' % idx, files, getfilectx)
tr = repo.transaction()
try:
previous = repo.commitctx(mctx)
tr.close()
finally:
del tr
print "committed %d:%s" % (idx, node.short(previous))
idx += 1
#nobj = gc.collect()
#print "GC collected %d unreachable objects" % nobj
finally:
lock.release()
try:
main()
except KeyboardInterrupt:
sys.exit("interrupted")
----------------------------------------------------------------------------
More information about the Mercurial-devel
mailing list