Manifest v2 Plan

Use of the tree manifests described in TreeManifestPlan will result in a new manifest hash, so now is good time to introduce a new manifest format. Much of the below comes from ImprovingManifestCompressionPlan.

Format

To identify the manifest as being version 2, the first line will start with a null byte (an empty path, which is disallowed in v1). What follows it is a header:

\0{metadata}\n

The metadata field stores key/value pairs, with each pair separated by a null byte. The value is separated from the key by a colon (:).

For example:

\0treemanifest:\n

File entries

The current format is:

{path}\0{40-byte hex nodeid}{flags}\n

The proposal is to change it to:

{path}\0{flags[\0{metadata}]}\n{20-byte binary nodeid}\n

The hash is binary, saving 20 bytes per line, and is on a separate line to make deltas smaller.

If the flags field ends with a null byte, what follows it (until end of line) is metadata. The format is the same as the header metadata. There are no current plans for what will be stored in this field.

With stem compression, we would simply replace the first byte of the path with the number of bytes to copy from the previous path.

Space savings (or NOT)

Storage sizes (post gzip compression) in bytes when run on the first revisions of the Mozilla repo:

v1

v2 without stem compression

v2 with stem compression

Full revision (rev 1 of the repo)

769307

674620 (-12% vs v1)

634897 (-17% vs v1, -6% vs no stem compression)

Next 4999 revisions

1583141

1000899 (-37% vs v1)

974380 (-38% vs v1, -3% vs no stem compression)

First 5000 revisions

2352448

1675519 (-29% vs v1)

1609277 (-32% vs v1, -4% vs no stem compression)

HOWEVER, when run on the entire history of mozilla-unified, the space usage increases. With 336202 in the repo, 00manifest.d went from 163M to 277M. hg debugrevlog -m output follows. As you can see, uncompressed size goes down to about 40%, but average chain length also goes down to about 20%, meaning we emit more full manifests. Compression ratio (I'm not sure how that's measured) also goes down to about 20%.

Manifest v1:

format : 1
flags  : generaldelta

revisions     :    335351
    merges    :     15769 ( 4.70%)
    normal    :    319582 (95.30%)
revisions     :    335351
    full      :       202 ( 0.06%)
    deltas    :    335149 (99.94%)
revision size : 170718580
    full      :  22434375 (13.14%)
    deltas    : 148284205 (86.86%)

avg chain length  : 14234
max chain length  : 36788
compression ratio : 15829

uncompressed data size (min/max/avg) : 51 / 14689628 / 8058636
full revision size (min/max/avg)     : 52 / 3783061 / 111061
delta size (min/max/avg)             : 0 / 1585383 / 442

deltas against prev  : 252069 (75.21%)
    where prev = p1  : 250597     (99.42%)
    where prev = p2  :   1393     ( 0.55%)
    other            :     79     ( 0.03%)
deltas against p1    :  76206 (22.74%)
deltas against p2    :   6874 ( 2.05%)
deltas against other :      0 ( 0.00%)

Manifest v2:

format : 1
flags  : generaldelta

revisions     :    335195
    merges    :     15733 ( 4.69%)
    normal    :    319462 (95.31%)
revisions     :    335195
    full      :       227 ( 0.07%)
    deltas    :    334968 (99.93%)
revision size : 290313187
    full      :  56582578 (19.49%)
    deltas    : 233730609 (80.51%)

avg chain length  : 2381
max chain length  : 11123
compression ratio : 3524

uncompressed data size (min/max/avg) : 35 / 5337933 / 3052630
full revision size (min/max/avg)     : 35 / 3105270 / 249262
delta size (min/max/avg)             : 0 / 1598061 / 697

deltas against prev  : 257137 (76.76%)
    where prev = p1  : 252886     (98.35%)
    where prev = p2  :   4102     ( 1.60%)
    other            :    149     ( 0.06%)
deltas against p1    :  74190 (22.15%)
deltas against p2    :   3641 ( 1.09%)
deltas against other :      0 ( 0.00%)

Backwards compatibility

When the user has set the config (experimental.manifestv2 for now), any new commit will be written using the new format, and we'll add an entry to requires at that point. Cloning from a v1 repo results in a v1 repo. Cloning from a v2 repo results in a v2 repo (meaning requires contains manifestv2). Pulling from a v2 repo into a v1 repo will be allowed only if the experimental.manifestv2 config is set. Similarly, pushing from a v2 repo into a v1 repo will be allowed only if the config is set on the destination repo.

TBD: Will we convert to old format on the fly for exchange?

Readdelta

In a few places, we use manifest deltas without resolving the entire manifest. One example is hg verify, which slows down a lot when not taking advantage of reading deltas (a naive test shows >5x on the Mozilla repo). Since the new format splits up file entries on two lines, deltas for modifications will not include the file path, which means it will not be useful. Therefore, we will not be reading delta for v2 manifests. However, the problem is that it's not obvious whetera read delta is in the old or the new format. There seems to be a few options here:

  1. Never read deltas when using the new format (as indicated by the requires entry) and accept that some operations will be slower.

  2. Look closer at the delta content. It seems like a delta of the old format could not be mistaken for a delta of the new format, or vice versa.
  3. Modify the format so it's trivial to determine whether it's a v1 or v2 manifest by reading a delta. This can be done by prepending every line by a null byte (empty path).

Note that the latter two options involve reading the delta only to have to read the full content anyway if the delta is for the new format.

ManifestV2Plan (last edited 2016-11-18 17:44:49 by MartinVonZweigbergk)