hg-git and round-tripping (and file copies?)

Gregory Szorc gregory.szorc at gmail.com
Thu Mar 16 16:38:18 EDT 2017


On Thu, Mar 16, 2017 at 1:05 PM, Danek Duvall <danek.duvall at oracle.com>
wrote:

> In trying to convert
>
>     https://hg.java.net/hg/solaris-userland~gate
>
> to a git repo and back, I'm seeing issues at changeset 34, where the hash
> changes for reasons I can't see.  If I do a diff of the debug log, I see
> it's due to the manifest:
>
>     $ diff -u =(hg log -R userland-more --debug -r 34) =(hg log -R
> userland-more.hgagain --debug -r 34 | grep -v "^phase:")
>     --- /tmp/zshhHyEIb      2017-03-16 11:37:57.601340643 -0700
>     +++ /tmp/zshlyqHbd      2017-03-16 11:37:57.793642372 -0700
>     @@ -1,12 +1,10 @@
>      no terminfo entry for sitm
>     -changeset:   34:d20b10eba31725ad8954aa6d20374da512f0e636
>     -tag:         build-149
>     +changeset:   34:2ccb817b85926f410df2a6bd23000265805088df
>      parent:      33:371c8e56136d19872ae7db8d273f9de78c8fa783
>      parent:      -1:0000000000000000000000000000000000000000
>     -manifest:    34:e031f26e68549dadb3dfb4705d429c75622a58b4
>     +manifest:    34:5a12a2a1bf3e7c0f7c30d01bd09a2e37185bcfb6
>      user:        Norm Jacobs <Norm.Jacobs at Sun.COM>
>      date:        Sun Sep 19 13:50:53 2010 -0700
>     -phase:       public
>      files:
>         components/Makefile
>         make-rules/prep.mk
>
> and if I use debugdata to look at the manifest at changeset 34, I see:
>
>     $ gdiff -a -u =(hg -R userland-more debugdata -m 34) =(hg -R
> userland-more.hgagain debugdata -m 34)
>     --- /tmp/zshOdnjza      2017-03-16 11:53:16.971130878 +0000
>     +++ /tmp/zshzoTzmc      2017-03-16 11:53:17.118194061 +0000
>     @@ -24,12 +24,12 @@
>      make-rules/setup.py.mk302733d738cc7c6cceb63457442f24f931867472
>      make-rules/shared-macros.mk03dd5df583b6e39a17ba66fc6ed6205df7f6be49
>      tools/Makefilecc964766028e3b963b4a321c88815d211415006b
>     -tools/bass-o-matica618ef38ceda467b9a09680dd8b94debcd303037x
>     +tools/bass-o-matic349f9611499fddf1a110f9488a84fb110c90b7bfx
>      tools/build-watch.df69b9a2b6a265c06268733430bbf3f9aa7d5e160x
>      tools/build-watch.pl5e23340c7a84ac555e630a5ccdc28eceda95f4b6x
>      tools/time.ca0a1f64ff8ac947ce9d045e0448f8ee72f9fd273
>     -tools/userland-fetch851170bb5cebf2648c53d4909eac26ac2055cdd3x
>     -tools/userland-unpack0977e35fa356d4cfab889b93613dc75d90d89b6bx
>     +tools/userland-fetchbae023e70db29fd07f6f989aaa858cfaed09238ax
>     +tools/userland-unpackb3800b9db86df38a644a653b3095805b269b6ac6x
>      transforms/actuatorsc9d84677229efde5f89b1d985de5cd1b09267b56
>      transforms/archive-libraries-drop5b346a0133242f460ff66f6689
> 790da094ce27f6
>      transforms/comparison-cleanupde1288c586594a171d43a3da5234cb920be408cc
>
> Now, those three files were copied in that changeset, but they're not the
> first to be copied, so it's not that, strictly.  But it is the first
> changeset in which files were copied without being modified.
>
> The index data is off-by-one, if that makes any difference:
>
>     $ hg -R userland-more debugrevlog -d tools/bass-o-matic
>     # rev p1rev p2rev start   end deltastart base   p1   p2 rawsize
> totalsize compression heads chainlen
>         0    -1    -1     0  2175          0    0    0    0    6005
> 6005           2     1        0
>         1     0    -1  2175  2228          0    0    0    0    5929
>  11934           5     1        1
>
>     $ hg -R userland-more.hgagain debugrevlog -d tools/bass-o-matic
>     # rev p1rev p2rev start   end deltastart base   p1   p2 rawsize
> totalsize compression heads chainlen
>         0    -1    -1     0  2174          0    0    0    0    6005
> 6005           2     1        0
>         1     0    -1  2174  2227          0    0    0    0    5929
>  11934           5     1        1
>
> Any thoughts on how to further debug this?
>
> Or is this just
>
>     https://bitbucket.org/durin42/hg-git/issues/46
>
> and I'm out of luck?
>

It is effectively impossible to round-trip between Git and Mercurial when
file copies are involved. This is because Mercurial's filelog hashes
include copy metadata and the parent nodes. Git's blob hashes, by contrast,
are effectively content only. When you convert from Mercurial to Git, it
will drop copy metadata (because Git doesn't track it explicitly). Then
when you convert back to Mercurial, the copies have to be detected "just
right" by hg-git for the hashes to align. Furthermore, the files have to be
reintroduced in the same order, or the filelog parents may not align and
the hashes may diverge. If a repo isn't linear, there's a non-zero chance
of that happening.

It is best to have a single canonical repo and replicate from that.
Attempting "syncing" from multiple discrete repos will only lead to
divergence.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170316/ec176ee2/attachment.html>


More information about the Mercurial-devel mailing list