Request for help adding Mercurial output to cvs2svn

Patrick Mézard pmezard at gmail.com
Mon Feb 11 15:26:05 CST 2008


Michael Haggerty a écrit :
> I am the main developer of cvs2svn [1], a Python program that can
> convert CVS repositories to Subversion or to git while retaining all
> project history.  I would like some help adding the ability to convert
> from CVS to Mercurial.
> 
> cvs2svn is a very robust tool for one-time conversions from CVS to other
> SCMs.  Aside from being able to handle just about every CVS repository
> that we've ever seen, cvs2svn has very many customization options [2].
> Now that Mercurial is reaching 1.0 status, I think it would benefit
> greatly from having a very solid tool like cvs2svn to help people
> migrate from CVS to Mercurial without fear of losing data.  (And I have
> a bad conscience appearing to favor git over Mercurial :-) )
> 
> I have played with Mercurial a little bit but am no expert.  But I *am*
> an expert in many bizarre aspects of CVS repositories and in cvs2svn.
> Some months ago I added git output to cvs2svn.  It only took a day or
> two of programming to get it to output data that can be read by the
> git-fast-import tool [3].  I naively imagine that it should be a
> comparable amount of work to add hg output.
> 
> I have some questions:
> 
> 1. Is there a documented way of getting data into Mercurial, short of
> creating a working directory and committing changesets one by one?
> 
>   - Most convenient would be dump file format that can be loaded into a
> repository by Mercurial (analogous to those read by "svnadmin load" or
> "git-fast-import").  If so, is this format documented?

There is nothing like that. You could generate thousands of git diffs but that would not be really efficient.

>   - Another alternative would be a Python API that could be used by
> cvs2svn to get commits into a Mercurial repository efficiently,
> including metadata (e.g., commit timestamps that are not equal to the
> current time and authors not equal to the person running the
> conversion).  If so, is it documented somewhere or is there example code
> that uses this API?

No, it's not documented but the convert extension (http://hg.intevation.org/mercurial/file/b7f44f01a632/hgext/convert/) already does that. "common.py" describes the "sink" interface and "hg.py" implements it. The interface itself is strongly coupled with the conversion process, calls are expected to come in a well defined order, but the implementation itself is simple. The conversion takes place in convcmd.convert. Mercurial sink is probably more complicated that what you expect but it supports many conversion modea (more details below) and works incrementally.

More specific questions are welcome.

> 2. What would be the natural, idiomatic way to represent CVS branches

There are two options:
- Represent them with named branches, this is the default mode of the convert extension. All branches are converted into a single mercurial repository and every one is mapped by a named branch. A named branch is a kind of inherited tag. "hg branch" tags the working directory with the supplied branch name, and next commits will be tagged as well. Named branches are listed with "hg branches" and can be passed as revision specifiers in many operations. There are issues with their use as a development tool but I think this is the best choice for repository conversion.
- Create one target repository per branch. That's what "convert.hg.clonebranches=1" does. This is the natural way to work with branches in mercurial but I think it is unsuitable for a conversion: cloning the conversion result means cloning every branch. Plus generating multiple repositories at once is much slower.

These approaches are not exclusive, individual clones can be extracted from a repository containing multiple named branches, and if clones are themselves named branches they can be pulled again in a single repository.

> and tags in Mercurial?

Use tags. Mercurial tags differ from Git ones because they are part of the history and no additional metadata. Tags are defined as revision identifier to tag name mappings written in a .hgtags text file, versioned along all other repository files (there are also local, unversioned tags but they would not be preserved when cloning the repository). The question here is: when to write and commit these tags ? When you use "hg tag", a tag revision is appended after target revision, which is what you would expect. The convert extension generate tags at the end of every conversion pass (it can work incrementally). It means tags can be defined way after the referenced revisions, which may be a little problematic if you work in the revision range between a referenced revision and the next tag file update. I think this is annoying at best, and you can do the same at least until you are happy with everything else.

--
Patrick Mézard


More information about the Mercurial-devel mailing list