hg-git and round-tripping (and file copies?)

Mike Hommey mh at glandium.org
Thu Mar 16 18:53:01 EDT 2017


On Thu, Mar 16, 2017 at 01:38:18PM -0700, Gregory Szorc wrote:
> On Thu, Mar 16, 2017 at 1:05 PM, Danek Duvall <danek.duvall at oracle.com>
> wrote:
> 
> > In trying to convert
> >
> >     https://hg.java.net/hg/solaris-userland~gate
> >
> > to a git repo and back, I'm seeing issues at changeset 34, where the hash
> > changes for reasons I can't see.  If I do a diff of the debug log, I see
> > it's due to the manifest:
> >
> >     $ diff -u =(hg log -R userland-more --debug -r 34) =(hg log -R
> > userland-more.hgagain --debug -r 34 | grep -v "^phase:")
> >     --- /tmp/zshhHyEIb      2017-03-16 11:37:57.601340643 -0700
> >     +++ /tmp/zshlyqHbd      2017-03-16 11:37:57.793642372 -0700
> >     @@ -1,12 +1,10 @@
> >      no terminfo entry for sitm
> >     -changeset:   34:d20b10eba31725ad8954aa6d20374da512f0e636
> >     -tag:         build-149
> >     +changeset:   34:2ccb817b85926f410df2a6bd23000265805088df
> >      parent:      33:371c8e56136d19872ae7db8d273f9de78c8fa783
> >      parent:      -1:0000000000000000000000000000000000000000
> >     -manifest:    34:e031f26e68549dadb3dfb4705d429c75622a58b4
> >     +manifest:    34:5a12a2a1bf3e7c0f7c30d01bd09a2e37185bcfb6
> >      user:        Norm Jacobs <Norm.Jacobs at Sun.COM>
> >      date:        Sun Sep 19 13:50:53 2010 -0700
> >     -phase:       public
> >      files:
> >         components/Makefile
> >         make-rules/prep.mk
> >
> > and if I use debugdata to look at the manifest at changeset 34, I see:
> >
> >     $ gdiff -a -u =(hg -R userland-more debugdata -m 34) =(hg -R
> > userland-more.hgagain debugdata -m 34)
> >     --- /tmp/zshOdnjza      2017-03-16 11:53:16.971130878 +0000
> >     +++ /tmp/zshzoTzmc      2017-03-16 11:53:17.118194061 +0000
> >     @@ -24,12 +24,12 @@
> >      make-rules/setup.py.mk302733d738cc7c6cceb63457442f24f931867472
> >      make-rules/shared-macros.mk03dd5df583b6e39a17ba66fc6ed6205df7f6be49
> >      tools/Makefilecc964766028e3b963b4a321c88815d211415006b
> >     -tools/bass-o-matica618ef38ceda467b9a09680dd8b94debcd303037x
> >     +tools/bass-o-matic349f9611499fddf1a110f9488a84fb110c90b7bfx
> >      tools/build-watch.df69b9a2b6a265c06268733430bbf3f9aa7d5e160x
> >      tools/build-watch.pl5e23340c7a84ac555e630a5ccdc28eceda95f4b6x
> >      tools/time.ca0a1f64ff8ac947ce9d045e0448f8ee72f9fd273
> >     -tools/userland-fetch851170bb5cebf2648c53d4909eac26ac2055cdd3x
> >     -tools/userland-unpack0977e35fa356d4cfab889b93613dc75d90d89b6bx
> >     +tools/userland-fetchbae023e70db29fd07f6f989aaa858cfaed09238ax
> >     +tools/userland-unpackb3800b9db86df38a644a653b3095805b269b6ac6x
> >      transforms/actuatorsc9d84677229efde5f89b1d985de5cd1b09267b56
> >      transforms/archive-libraries-drop5b346a0133242f460ff66f6689
> > 790da094ce27f6
> >      transforms/comparison-cleanupde1288c586594a171d43a3da5234cb920be408cc
> >
> > Now, those three files were copied in that changeset, but they're not the
> > first to be copied, so it's not that, strictly.  But it is the first
> > changeset in which files were copied without being modified.
> >
> > The index data is off-by-one, if that makes any difference:
> >
> >     $ hg -R userland-more debugrevlog -d tools/bass-o-matic
> >     # rev p1rev p2rev start   end deltastart base   p1   p2 rawsize
> > totalsize compression heads chainlen
> >         0    -1    -1     0  2175          0    0    0    0    6005
> > 6005           2     1        0
> >         1     0    -1  2175  2228          0    0    0    0    5929
> >  11934           5     1        1
> >
> >     $ hg -R userland-more.hgagain debugrevlog -d tools/bass-o-matic
> >     # rev p1rev p2rev start   end deltastart base   p1   p2 rawsize
> > totalsize compression heads chainlen
> >         0    -1    -1     0  2174          0    0    0    0    6005
> > 6005           2     1        0
> >         1     0    -1  2174  2227          0    0    0    0    5929
> >  11934           5     1        1
> >
> > Any thoughts on how to further debug this?
> >
> > Or is this just
> >
> >     https://bitbucket.org/durin42/hg-git/issues/46

Note that bug is about git->hg conversion where the original repository
is git.

> >
> > and I'm out of luck?
> >
> 
> It is effectively impossible to round-trip between Git and Mercurial when
> file copies are involved. This is because Mercurial's filelog hashes
> include copy metadata and the parent nodes. Git's blob hashes, by contrast,
> are effectively content only. When you convert from Mercurial to Git, it
> will drop copy metadata (because Git doesn't track it explicitly). Then
> when you convert back to Mercurial, the copies have to be detected "just
> right" by hg-git for the hashes to align. Furthermore, the files have to be
> reintroduced in the same order, or the filelog parents may not align and
> the hashes may diverge. If a repo isn't linear, there's a non-zero chance
> of that happening.

hg-git actually "stores" copy/rename in the commit messages, but that's
assuming the commit was done in mercurial and pushed to git with hg-git
in the first place. It should be able to recreate copy/renames from that
information, but there are subtle cases where that's not really
possible without even more information that is not available.

If your goal trying to round-trip between mercurial and git is to
provide developers with the possibility to use mercurial or git as they
like, and somehow make it work with developers pushing on both ends, you
should instead use a single source of truth (mercurial or git, whichever
you prefer keeping a server for), and let developers use conversion tools
on their end. hg-git can be used by developers who prefer mercurial when
the server is git (although it annoyingly adds visible metadata to git
commits in that case). git-cinnabar or git-remote-hg can be used by
developers who prefer git when the server is mercurial. (full
disclosure, I'm the author of git-cinnabar)

Mike


More information about the Mercurial-devel mailing list