[PATCH 3 of 3] Fetch revision log on demand (in batches of 50 revisions)

Daniel Holth dholth at fastmail.fm
Mon Jul 2 07:55:01 CDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brendan Cully wrote:
> On Sunday, 01 July 2007 at 21:47, Daniel Holth wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>
>> Edouard Gomez wrote:
>>> Merged the convert_svn.rev method bit into ./svn/enh-add-svn-
>>> converter.patch.
>>>
>>> I advise putting the 50 revision fetching constant somewhere
>>> near the top so we can try a few values easily for big repos
>>> (mplayer > 23000 commits)
>> I have fine results with a svnsync of The Subversion Repository,
>> with upwards of 7500 revs at a time.
>>
>> As for fetching the log in pieces, it should be quite simple to
>> measure the point where the overhead of asking for a log is more
>> than getting the log messages themselves. If 1.4.2 bindings run
>> out of ram with 100 revs??? then we could play with better memory
>> pool management, explicitly closing memory pools and so forth.
>> The memory pool management is one of the things that has been
>> made more automatic in the bindings over time.
>
> we probably should be releasing memory pools when we can. As for
> the 50 rev batch number, I would optimise maybe in the other
> direction: use as small a batch as possible before the overhead of
> calling get_log becomes noticeable.
>
>> We also really need to invent a way for the main converter to
>> provide scratch space for convert_svn (a shelve file in dest/.hg/
>> would be fine). If the converter_source implementation had a way
>> to even look at the shamap file, which happens to encode the last
>> branch/revision we fetched, it would also be a great clue about
>> which revisions we need to log for.
>
> Probably true, but I don't see a big downside to simply doing the
> fetch from getcommit - walktree will only ask for revisions as they
>  are needed. Frankly it could even be deferred until later, since
> we're not currently doing anything sensible with the parent
> information. We could defer the actual fetch until an attribute
> other than parents is requested. But I think we're going to have to
> get better parent info soon, if branches are going to work right.
To properly respect network bandwidth, convert_svn must ask for all
the files in a particular revision at once as a delta from the
previous revision. It doesn't even begin to attempt this at the
moment. But I agree in deferring all of the get-log operations until
the last possible moment.

Why do you say that branches don't work right? They work fine for me,
I just have to run the script once per branch to the same target
Mercurial repository, and no revision is converted more than once. The
history of merges done in Subversion is not recorded, of course, but
the source doesn't provide that information anyway (unless we are
converting revisions done by svnmerge.py or svk).

bzr-svn solves the branches problem with a class that describes the
branch style, and they can provide different ones depending on the
repository layout.
>
> I haven't yet seen anything particularly unworkable in the bindings
>  that are actually in distros now. And I think it's a lot
> friendlier to work with those if we can.
http://bazaar-vcs.org/BzrForeignBranches/Subversion has the patches.
I'm not sure quite what they fixed.

- - Daniel Holth
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGiPWkVh4W2pVfoMsRAlqlAKDHYcLS4krmdqA6JP3NssoYS5oToACghkEx
29tGqMGRYILpcNseVsB+lGw=
=NWlR
-----END PGP SIGNATURE-----



More information about the Mercurial-devel mailing list