cvs2hg memory use

Greg Ward greg-hg at gerg.ca
Mon Aug 3 16:47:18 CDT 2009


Hi all --

I have gotten cvs2hg to the point where I'm actually trying it out
converting a moderate-sized CVS repository (3600 files, 16,000
commits). The memory profile of cvs2hg varies dramatically depending
on which version of the Mercurial API I run it against:

1.1.1: starts around 23 MB, slowly creeps up to 33 MB after 4000 commits
1.2.1: similar
1.3.1: starts around 24 MB, but grows very quickly: 388 MB after 1300 commits,
          die with MemoryError after ~1300 MB and ~2400 commits

(All figures taken from the DRS column of 'ps' output on CentOS 5.3.
The "slow creep" is expected, since I keep a list of all committed
node IDs in memory.)

The heart of cvs2hg is this method:

  def _commit_memctx(self, mctx):
    # XXX should I be wrapping my txn in weakref.proxy()?
    txn = self.repo.transaction()
    try:
      node = self.repo.commitctx(mctx)
      txn.close()
    finally:
      del txn
    return node

Every commit goes through that.

One oddity: cvs2svn disables garbage collection because it takes pains
to create no cyclic data structures.  Mercurial presumably takes no
such pains, and would no doubt benefit from occasional GC.  In fact,
I'm going to add a gc.collect() call right at the end of the above
method and see if it helps.

But is this an expected change with 1.3?  Is it worth bisecting to see
where it happened?

Greg


More information about the Mercurial-devel mailing list