SVN conversion questions

Patrick Mézard pmezard at gmail.com
Sun Mar 30 11:31:32 CDT 2008


Jeff Squyres a écrit :
> On Mar 24, 2008, at 1:46 PM, Patrick Mézard wrote:
> 
>> I am tempted to add a way to specify exactly what branches are being
>> converted. I am not sure where to put that yet, ideas are welcome.
>> Maybe as a comma separated lists of svn paths (is comma allowed in svn
>> paths ?). That would also enable support for deleted branches.
> 
> How about always picking up everything in the SVN repo unless told
> otherwise?  I would assume that if you're converting an SVN repo to
> mercurial, you'd want *everything* from history.  Even if you didn't
> necessarily want everything, wouldn't it be more conservative to convert
> everything and let the user prune the final result if there was more
> than they wanted (or perhaps use an option that says "only grab trunk,
> tags, branches").

This is right when you are performing a full conversion, usually intended to replace the source repository completely. Another popular use case is to convert projects only, sometimes partially (with convert.svn.startrev for instance) and track them over time. You do that when you contribute to an SVN project without using SVN. At work, we are officially using SVN but I only work with Mercurial, and use a conversion of some projects to make code reviews in Mercurial.

The second problem is: define "*everything*". Subversion repositories are loosely structured and the apparent flexibility of it let people to *really* mess with them. For instance, it's hard to tell that "v1.2-series" is a version container and not a project version without digging the whole project ancestry. Worse, at work we stupidly setup our Subversion repository so that "trunk", "branches" and "tags" are root projects containing subprojects. So that tags of projectA are mixed with tags of projectB. Again, this can be dealt with automatically but it requires much work (as in processing time). Convert extension tries to find a trade-off between full conversions like yours where you don't really case about conversion time and hourly incremental conversions like mine where every second counts. It currently assumes source repository has a canonical "trunk/branches/tags" layout and make sense of it. I have nothing about adding flexibility there, but automation is really expens
ive.


> Here's two more off-the-wall SVN->mercurial conversion question...
> 
> 1. I made an authors file to remap SVN commit usernames to proper
> name/e-mail strings.  However, our project is a few years old; we've had
> both
> 
> a) individuals move between different member organizations, and
> b) member organizations change e-mail domains
> 
> And therefore several members have had multiple different e-mail
> addresses over the years.  The SVN commit ID "jsquyres", for example,
> used to mean jsquyres at open dash mpi dot org, but now means jsquyres
> at cisco dot com.  Other SVN IDs have had 3 or more e-mail addresses
> associated with them (e.g., students interning at other Open MPI member
> organizations during the summer).
> 
> Is there a way to associate different e-mail addresses with a Subversion
> ID based on a time range?  Perhaps something like:
> 
> SVN_id="name and address string" start_date stop_date
> 
> jsquyres="Jeff Squyres <jsquyres at foo.example.com>" - 12/31/2006
> jsquyres="Jeff Squyres <jsquyres at bar.example.com>" 01/01/2007 12/31/2007
> jsquyres="Jeff Squyres <jsquyres at yow.example.com>" 01/01/2008 -
> 
> (where "-" means "beginning of time" or "end of time", depending on
> which field it was used)
> 
> ...or something along these lines (date formats may vary, etc.).  For
> backwards compatibility, if no quotes are used, the whole line can be
> taken as the string -- the new start/stop date stuff can be used only if
> there are quotes.
> 
> I realize that this is a super-picky question :-) 

Yes :-)

> ; I'm just looking to
> see if we can be as accurate in the history as possible when converting
> from SVN to hg.

There is no support for that, patches are welcome (I think ?)
 
> 2. In our commit messages, we not-infrequently use strings of the form
> "r[0-9]+" to refer to other SVN commits.  We do this not only because it
> helps us track our code internally, but also because we use Trac for our
> bug tracking and SVN browsing.  When viewing SVN commits in the Trac
> browser, Trac automatically hyperlinks strings in the commit message
> matching "r[0-9]+" to the corresponding commit entry.
> 
> However, after converting to hg, these r numbers in commit messages will
> no longer be relevant, and we'll lose both the historical references and
> the [incredibly convenient] cross-hyperlinking.  Is there a way to get
> the conversion process to either change (or add) the "r[0-9]+" strings
> to the relevant mercurial hash ID?  Perhaps a commit message that reads
> like this:
> 
> Fix borked r1234 commit.
> 
> Could be changed to:
> 
> Fix borked 6d8ec087ecd6e874bb5f44a3616878db89632892 commit.
> or
> Fix borked r1234 (6d8ec087ecd6e874bb5f44a3616878db89632892) commit.

Again, no support for that but it would be valuable. I think it should be done in two steps:
1- Convert from subversion, at the end of which you have a revision mapping svn -> hg
2- Rewrite the converted repository with hg -> hg conversion, rewriting the changesets programmatically using the mapping built in [1]

[2] is not supported. Also, once you can rewrite log messages, you can rewrite authors with your own rules thus solving your first question. Looks like enabling a python user supplied hook at the right place in the conversion process should do it.

--
Patrick Mézard


More information about the Mercurial mailing list