divergence using the convert extension

Benoit Fouletier bennews at free.fr
Thu Jun 21 08:52:09 UTC 2018


I have a pretty big Mercurial repo (40 GB), from a Unity game developed
over almost 3 years, by over a dozen contributors.
I want to make a new game using the same codebase/engine, but I'd like to
get rid of (most of) the content to avoid starting with a bloated repo.
Basically, the game's textures and sound files account for 90% of the repo
size, so I want to strip that. But obviously, I want to keep the history.

For this I'm using the convert extension. So far I've only used it as a
convoluted clone, to just copy everything without modification (as a sanity
check/stress test), and after a bugfix (
https://bz.mercurial-scm.org/show_bug.cgi?id=5526) it ran successfully over
the 26K changesets: awesome!

However, the resulting converted copy is 72.3GB, while the original is
60.5GB: a 20% increase! The size difference comes from the .hg/store/data
directory itself, for example I found one Photoshop file (appears as .psd.d
in the store to be exact) that went from 360MB to 534MB. Looking at the
history, that file was only modified once after the initial add. Is that
expected, a side-effect of convert? perhaps handling binary deltas
differently? Is there a way to recompress the store maybe?

Second thing is, since I didn't modify anything (yet) during the
conversion, I was expecting the hashes to stay intact (and ideally the end
result to be bitwise identical), but that's not what I'm seeing.
I tracked it down, the first 11K changesets have identical hashes, until
one changeset (a merge actually) suddenly diverges. I exported both the
original and converted changesets as patches, and diffed the patches: I
don't see any diff (expect for the hash itself), can't figure out why it
diverges. I'm wondering if it could be a line ending issue, although no
diff are readily appearant, and even so why would convert introduce a
change?

Obviously since I'm gonna strip stuff anyway, I _will_ diverge from the
original pretty early and will lose all hashes, that's fine, this is more
out of curiosity and making sure nothing too fishy it going on.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial/attachments/20180621/881a67dc/attachment.html>


More information about the Mercurial mailing list