Binary diffs and largefiles

Matt Mackall mpm at selenic.com
Fri Sep 21 14:54:07 CDT 2012


On Fri, 2012-09-21 at 10:26 -0500, dukeofgaming wrote:
> The reason I was looking into this is because we need to implement a
> better mechanism to distribute release updates (this is for an
> embedded system that runs on a single core OMAP platform [say,
> something of the capabilities of a good single-core smartphone]).
> These updates would most probably be several ISOs.
> 
> 
> So, just asking from the inverse perspective: do you think Mercurial
> would be suited as a bsdiff alternative to some degree (because of the
> memory consumption)?, and even maybe for distributing updates?

>From a big-picture perspective, there are basically two ways to do
deltas:

- whole file approaches (classic diff, bsdiff, Mercurial's internal
bdiff)
- windowed streaming approaches (rsync, xdelta, libxdiff)

The former take O(N**2) time and O(N) memory and can be tuned to give
"optimal" results in terms of edit distance, which isn't actually a
measure of compression. The latter take O(N) time and O(1) memory (in
other words, they scale) and may or may not give better compression.

So if you're actually dealing with single files approaching memory size
(like ISOs) on a small system, you probably need to use the latter. If
you're instead dealing with lots of medium-sized files, algorithms from
the former may or may not give better results.

Beyond that, Mercurial is not designed as a distribution system.
Presumably you don't want your clients to need to keep an ever-growing
copy of the entire history of your project locally (which Mercurial's
target audience considers to be a feature). Instead, you just want to
move them from release X to release Y with a minimum of additional
storage and bandwidth. That sounds like rsync batch mode.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list