SVN convert usability

Thu Jul 12 16:23:07 CDT 2007

On Thursday, 12 July 2007 at 21:08, Edouard Gomez wrote:
> I think much work has been done lately to get a nice SVN convertion 
> module through the convert extension.
> 
> But, it's clearly impossible to convert repos with more than 20000 
> revisions now. I used to get the mplayer repo converted from a svnsynced 
> local svn repo going trough runs of 2000 revisions. But since the last 
> changes to the SVN backend, each time it analyses the branches revisions, 
> it goes nuts trying to parse 23000 revisions at once, eats 2G of RAM 
> during 6 hours and finishes being OOM killed by the kernel.
> 
> Is there some clue where that memory goes ? I mean it's unbelievable 
> 23000 revisions would eat 2G. I tried putting some subpools instead of 
> root pools in our loops because that's what apache manuals advise to do, 
> but still... i saw some reference releases in our code, but still... i've 
> also tried pool clearing from time to time...
> 
> Any idea on that topic ?

My _guess_ is that we're holding indirect references to a bunch of the
memory, eg through self.files. I've been meaning to get rid of most of
that cache, especially when hg maintains its own commitcache to shadow
ours (I think I've fixed that one though).

Another problem is that we calculate the file list very early, for
every revision we visit. We should push that down to getchanges, and
not cache the result. I've seen getfile hog lots of memory too though,
so I think it's also possible the python bindings are simply buggy and
don't release memory properly. worst case here is to fork off a
subprocess for collecting some of this info in batches, but that's
really not appealing.

In any case, what should work for now is to run convert in batches
using -r (hg convert -r 100, hg convert -r 200, ...)