Optimizing manifest format

Benoit Boissinot bboissin at gmail.com
Fri Jun 8 15:34:42 CDT 2012


On Fri, Jun 8, 2012 at 6:13 AM, Wagner Bruna <
wagner.bruna+mercurial at gmail.com> wrote:

> Hello,
>
> Bryan O'Sullivan's recent tests with revlog compression motivated me to
> test an idea for enhancing manifest compression by changing the format
> actually stored in its revlog.
>
> The following script encodes manifest data like
>
> "a/b[hexid1][mode]\n
> a/c[hexid2][mode]\n
> d[hexid3][mode]\n"
>
> as:
>
> "a/b[mode]
> //c[mode]
> ///d[mode]
> [binid3][binid2][binid1]"
>
> The file paths are encoded as a serialized tree in pre-order, together
> with mode information. This avoids unneeded path name repetition, and
> group related files more closely (file changes in the same subtree will
> be grouped in the output, hopefully improving delta storage).
>
> Node ids are stored in binary form, and grouped at the end, to avoid
> mixing compressible (paths) and incompressible (hashes) data. Again,
> related file changes will hopefully tend to cluster together.
>

The disadvantage is that it becomes harder to parse partially, or to
extract information from just a binary delta (which is quite useful).

Benoit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120608/b117ab02/attachment.html>


More information about the Mercurial-devel mailing list