Turning cvs2{svn,git} into cvs2hg

Michael Haggerty mhagger at alum.mit.edu
Sun Jul 19 13:41:07 CDT 2009


Greg Ward wrote:
> [...]
> And it looks like cvs2git + hg-fastimport should be the better way.
> But there is a pretty big impedance mismatch between those two right
> now, particularly in the way that cvs2git generates "fixup" commits to
> turn CVS tags and branch points into something sensible for svn or
> git.  (In a nutshell, my "small" CVS test repo with ~40 branches
> becomes a 4000-head monster when I fastimport it into Mercurial.  If
> you want to make Mercurial crawl, give it 4000 heads.  Not pretty.
> And keep in mind that my real repository is ~8x bigger than the my
> small test repo.)

IMHO the impedance mismatch has nothing to do with cvs2git vs
hg-fastimport.  It is between CVS/Subversion's model of branches and
tags vs git/Mercurial's.  (At my basic level of understanding of
Mercurial, its branching model seems quite similar to git's.  Maybe I'm
wrong.)

CVS and Subversion allow branches and tags to be used in ways that would
be considered blasphemous in the DVCS world, and people really use these
features as important parts of their workflow.  For example, in CVS you
can do things like

- Tag (or branch) a subset of files from the source branch.

- Add some files from branchA and some from branchB to a tag or branch.

- Add different files to a branch at different times (e.g., add file2 to
the branch after file1, which was already on the branch, as been modified).

- In file1, branch branchB off of branchA; in file2, branch branchA off
of branchB.

It is quite easy (and common) for a tag or a branch, at "creation", to
contain a mishmash of file revisions that never coexisted on any
"source" branch.

Now, the most minimal, unambitious, sine qua non requirement for a
cvs2hg conversion tool is that the results of checking out a tag or the
tip of a branch in CVS and Mercurial should be identical.  Therefore,
regardless of the tool, we need a way to represent all of the above
situations in Mercurial.  Decide that and the rest is a simple matter of
programming.

Another issue that has not been resolved satisfactorily is what to
record in the DAG for CVS branches that do not start as 1:1 copies of a
source branch.  Should a source branch be chosen to be the parent of the
new branch anyway?

And what should happen to the DAG if files are added from a source
branch to an existing branch?  Should the commit be considered a merge
with the new source branch as the second parent, even though the source
branch might have other content that wasn't merged over?  Currently,
cvs2* always creates a merge commit whenever any content is added from
one branch to another, but I am skeptical that this is the best behavior.

It could very well be that Mercurial has no way of representing a
general CVS repository, or that the only way is prohibitively
inefficient.  In that case there would be no way to migrate from CVS to
Mercurial without losing information.  In that case it would be nice if
the tool offered the user a way to selectively discard information in
such a way as to maximize the value of the resulting repository.

> I can see three ways to fix that mismatch:
> [...]

It's premature to think about technical solutions before the conceptual
decisions have been made.  But all else being equal, I think that having
a lingua franca for DVCSs would be a big advantage, and git-fast-import
format is the only obvious contender right now.

Michael


More information about the Mercurial-devel mailing list