Optimizing manifest format

Matt Mackall mpm at selenic.com
Sat Jun 9 10:30:24 CDT 2012


On Fri, 2012-06-08 at 22:34 +0200, Benoit Boissinot wrote:
> On Fri, Jun 8, 2012 at 6:13 AM, Wagner Bruna <
> wagner.bruna+mercurial at gmail.com> wrote:
> 
> > Hello,
> >
> > Bryan O'Sullivan's recent tests with revlog compression motivated me to
> > test an idea for enhancing manifest compression by changing the format
> > actually stored in its revlog.
> >
> > The following script encodes manifest data like
> >
> > "a/b[hexid1][mode]\n
> > a/c[hexid2][mode]\n
> > d[hexid3][mode]\n"
> >
> > as:
> >
> > "a/b[mode]
> > //c[mode]
> > ///d[mode]
> > [binid3][binid2][binid1]"
> >
> > The file paths are encoded as a serialized tree in pre-order, together
> > with mode information. This avoids unneeded path name repetition, and
> > group related files more closely (file changes in the same subtree will
> > be grouped in the output, hopefully improving delta storage).
> >
> > Node ids are stored in binary form, and grouped at the end, to avoid
> > mixing compressible (paths) and incompressible (hashes) data. Again,
> > related file changes will hopefully tend to cluster together.
> >
> 
> The disadvantage is that it becomes harder to parse partially, or to
> extract information from just a binary delta (which is quite useful).

Indeed. We definitely need to be able to look at the contents of
manifest deltas indepedent of the rest of the manifest, otherwise
various commands can become -very- slow. Verify and pull -r in
particular.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list