[PATCH 3 of 3] Fetch revision log on demand (in batches of 50 revisions)

Sun Jul 1 20:57:52 CDT 2007

On Sunday, 01 July 2007 at 21:47, Daniel Holth wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Edouard Gomez wrote:
> > Merged the convert_svn.rev method bit into ./svn/enh-add-svn-
> > converter.patch.
> >
> > I advise putting the 50 revision fetching constant somewhere near the top
> > so we can try a few values easily for big repos (mplayer > 23000 commits)
> I have fine results with a svnsync of The Subversion Repository, with
> upwards of 7500 revs at a time.
> 
> As for fetching the log in pieces, it should be quite simple to
> measure the point where the overhead of asking for a log is more than
> getting the log messages themselves. If 1.4.2 bindings run out of ram
> with 100 revs??? then we could play with better memory pool
> management, explicitly closing memory pools and so forth. The memory
> pool management is one of the things that has been made more automatic
> in the bindings over time.

we probably should be releasing memory pools when we can. As for the
50 rev batch number, I would optimise maybe in the other direction:
use as small a batch as possible before the overhead of calling
get_log becomes noticeable.

> We also really need to invent a way for the main converter to provide
> scratch space for convert_svn (a shelve file in dest/.hg/ would be
> fine). If the converter_source implementation had a way to even look
> at the shamap file, which happens to encode the last branch/revision
> we fetched, it would also be a great clue about which revisions we
> need to log for.

Probably true, but I don't see a big downside to simply doing the
fetch from getcommit - walktree will only ask for revisions as they
are needed. Frankly it could even be deferred until later, since we're
not currently doing anything sensible with the parent information. We
could defer the actual fetch until an attribute other than parents is
requested. But I think we're going to have to get better parent info
soon, if branches are going to work right.

> And if it is really a problem to merely fetch the log, it's not any
> slower to parse the XML output of "svn log". The big wins from using
> the API come from not reconnecting to the source repository so many
> times, and from not actually typing "svn update" on some working copy
> for each fetched revision.

Another possibility is to defer the directory->files lookup until
getchanges. That's one of the bigger costs of fetching the whole log
at once.

> I should mention that there's a lot to be said for using the *working
> bindings*...  they aren't that hard to get...

I haven't yet seen anything particularly unworkable in the bindings
that are actually in distros now. And I think it's a lot friendlier to
work with those if we can.