Importing to hg from perforce with large binary files

Greg Ward greg-hg at gerg.ca
Sat Jan 29 14:17:54 CST 2011


On Thu, Jan 27, 2011 at 10:09 AM, Mark Mason <mason97123 at gmail.com> wrote:
> We're also looking at your cvs2hg converter script for conversion of our
> large CVS repo. Fortunately, while there are some large files in there,
> none of them are binaries, so we can sidestep at least this problem. :)

Keep in mind that Mercurial has no problem whatsoever with binary
files.  It has one problem with *any* sort of large file: it assumes
that it's managing source code, i.e. any file is small enough to hold
in memory -- up to 3 copies if doing a merge.  And it has another
problem with already-compressed data: revlog is based on storing
compressed binary deltas, and compressed data doesn't yield very good
deltas.  And I imagine the deltas it does yield would not be very
compressible.

So large already-compressed data is the worst case.  And many binary
formats (.gz, .zip. .jpeg, .png, .mpeg, ...) are already compressed.
Thus, we use "large binary" as shorthand for "large, compressed, not
delta-friendly".

IOW: a 1 MB JPEG is definitely worse than a 1 MB XML file, but a 1 MB
XML file is worse than a 100 kB TIFF file (uncompressed binary data).
If you have to put that 1 MB XML file under Mercurial's control, it's
almost certainly better to leave it uncompressed: Mercurial will
compress it internally, and you'll get better deltas on uncompressed
data.

Greg


More information about the Mercurial mailing list