Turning cvs2{svn,git} into cvs2hg

Greg Ward greg at gerg.ca
Sun Jul 19 11:03:12 CDT 2009

On Sun, Jul 19, 2009 at 7:23 AM, Dirkjan Ochtman<dirkjan at ochtman.nl> wrote:
> On Fri, Jul 17, 2009 at 15:47, Greg Ward<greg at gerg.ca> wrote:
>> I can see three ways to fix that mismatch:
>> 1) modify the way cvs2git generates fixup commits so that
>> hg-fastimport does not create pathological Mercurial repos
>> 2) write a filter that turns the fastimport dump created by cvs2git
>> into something that hg-fastimport + Mercurial handle nicely
>> 3) modify hg-fastimport to handle those fixup commits directly
>> I think #1 benefits the most people, since it could potentially make
>> life simpler for git-fastimport as well.  (They could, in theory,
>> eventually drop support for the implementation quirk that cvs2git
>> takes advantage of.  Unfortunately, they promoted that implementation
>> quirk to a documented part of the syntax when cvs2git started using
>> it, so that seems unlikely.)  (Michael H.: this is my brief summary of
>> a thread on the git mailing list that you pointed out to me a few
>> months ago; if I'm summarizing inaccurately, my apologies.)
> I'd be interested to hear what the quirk is.

I don't remember the precise details (which is why I was vague in my
original email).  But I think there are two mismatches right now
between cvs2git's output and hg-fastimport.  The shallow mismatch is
that "fixup" commits (needed to turn CVS tag and branch points into a
single revision/commit/changeset -- i.e you need feature this whether
your target is svn, git, bzr, or hg) (ab)use the fastimport syntax in
a peculiar way, by claiming to be merges when they aren't really.  Or
something like that.  git-fast-import handles them correctly thanks to
an implementation coincidence, and that is the coincidence that was
promoted to official syntax.  hg-fastimport currently does not handle
them very well, and sooner or later someone is going to have to fix
that, since it's now officially part of the syntax.  But it would be
easier on hg-fastimport in the short term to change the output of
cvs2git, as long as git-fast-import gives the same result.

The deeper mismatch is that every fixup commit becomes its own little
head.  That's where my "4000-head monster repo" came from: 4000 CVS
tags that did not correspond exactly to a point in history and
therefore required fixup commits.  (I don't think branch points that
require a fixup commit are a problem, since they are of course the
root of a new branch, which will eventually terminate in a head.)  If
you want an accurate conversion, then I don't see how you can get away
from fixup commits.  So this is really a mismatch between CVS' insane
tagging model (which cvs2git is reflecting accurately) and Hg's
preference for "not too many heads, please".

> When I last looked at the
> fast-import format, it looked like something that would suit Mercurial
> quite well. I hear bzr has taken it up as well.

Yes and yes.  hg-fastimport is heavily based on bzr-fastimport.  (Much
of the work I have done is in extracting a common library, rather than
maintaining a fork of an old version of the parser in bzr-fastimport.)

> It would be awesome to
> have a good format to exchange data between at the least the current
> crop of DVCSs, e.g. bzr, git and hg. Getting a good fast-import tool
> would then allow deprecation of our custom bzr and git import code.

YES and YES.  hg-fastimport should definitely beat 'hg convert' on git
input, since it preserves branch names.  (Assuming you want git
branches to become hg branches.)  I haven't tried it in a while
though, since I'm sadly obsessed with the CVS->Hg problem.

> I've run Benoit's revlog reordering script on the Python repo with
> great succes. You should try it for your case as well.

I did, back when I didn't know what the heck a manifest was or why
reordering it would make such a difference.  It worked, but I didn't
like the fact that I didn't understand why or how it worked.  Should
try again now that I actually understand these things.  ;-)

> If that works
> for you, I'd posit that the toposorting the changelog as well wouldn't
> make that much of a difference. Actually, post-facto toposorting would
> probably not be extremely hard and it would be useful to have a script
> for in general.

Hmmm.  Neat idea.  Especially considering that I have a fourth
toposort algorithm sitting in my patch queue (minor variation on the
default: not quite as space efficient, but smarter about branch
order).  Having these things tucked away in hgext/convert is a bit

>>  * hg-fastimport might go back to being a neglected and unloved extension
>>    if I concentrate on cvs2hg
> Maybe I ought to take it up, then. Are there any actual users out there?

Hard to say.  There probably would be if it worked better.
Installation/setup is tricky right now because of the dependency on
pyfastimport (the common library that I factored out).  Not sure of
the best way to improve that.

Anyways: rather than making users play follow-the-repo, why don't I
just give you write access to the bitbucket mirror?  Do you have a
bitbucket account?


More information about the Mercurial-devel mailing list