Differences between revisions 8 and 9
Revision 8 as of 2013-03-18 19:21:40
Size: 4590
Editor: KevinBullock
Comment: add to CategoryNewFeatures
Revision 9 as of 2014-03-07 19:26:26
Size: 1390
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
This page describes the second iteration of the bundle format, tentatively called HG19 (since we hoped to include it with Mercurial 1.9). <<Include(A:dev)>>
Line 3: Line 3:
Bundles consist of the following sections:
 * A bundle header, describing the version and features present in the data.
 * A changegroups section containing the changelog, the manifest and each relevant filelog.
 * Optionally, a footer containing an index for more efficient random access?
This page describes the current plan to get a more modern and complete bundle format. (for old content of this page check BundleFormatHG19)
Line 10: Line 7:
== Nomenclature ==
For the sake of this document, the following (otherwise often quite ambigious) terms are used:
(current content is copy pasted from 2.9 sprint note)
Line 13: Line 9:
'''bundle'''
 Data in the format described in this document, including all headers. Could also be called full bundles.
'''headerless bundle'''
 A bundle without the first 6 bytes of the header, containing the version identifier and compression type. These have traditionally been used internally and in the wire protocol, and are always uncompressed.
'''chunk'''
 Data corresponding to a single revlog entry.
'''changegroup'''
 A list of chunks containing revlog entries. Sometimes called chunkgroup.
New bundle format
Line 22: Line 11:
== Sections ==
For each section, the offsets are given relative to the beginning of the section. Fields with unknown length are assigned constants a, b, c etc.
lightweight
new manifest
general delta
bookmarks
phase boundaries
obsolete markers
>sha1 support
pushkey
extensible for new features (required and optional)
progress information
resumable?
transaction commit markers?
    It's possible to envision a format that sends a change, its manifest, and filenodes in each chunk rather than sending all changesets, then all manifests, etc.
capabilities
Line 25: Line 26:
=== Header ===
The format of the bundle header is described below. Traditionally, the first part of the header (only part in the existing
format), is often left out in internal processing and over the wire. This part consists of the first 6 bytes up to and including
the compression type. In such cases, the bundles are always considered to be uncompressed. It has not been decided what we
will do with the new bundle format.
New header:
Line 31: Line 28:
|| '''Offset''' || '''Size''' || '''Type''' || '''Description''' ||
||<)> 0 ||<)> 4 || string || Bundle format version. Always contains "HG19". ||
||<)> 4 ||<)> 2 || string || Compression type. Either "BZ", "GZ" or "UN". ||
||<)> 6 ||<)> 4 || uint || Length of feature string, in bytes. ||
||<)> 10 ||<)> a || string || Bundle features (or requirements). A list of newline separated strings describing features present in the bundle (unterminated). ||
type Header struct {
    length uint32
    lNode byte
    node [lNode]byte
Line 37: Line 33:
=== Changegroups section ===
The changegroups section has the following format:
    // if empty (lP1 ==0) then default to previous node in the stream
    lP1 byte
    p1 [lP1]byte
Line 40: Line 37:
|| '''Offset''' || '''Size''' || '''Type''' || '''Description''' ||
||<)> 0 ||<)> 4 || uint || Number of changelog entries. ||
||<)> 4 ||<)> b || group || Changegroup containing changelog entries. ||
||<)> b + 4 ||<)> 4 || uint || Number of manifest entries. ||
||<)> b + 8 ||<)> c || group || Changegroup containing manifest entries. ||
||<)> b + c + 8 ||<)> 4 || uint || Number of filelog changegroups (note: not the number of entries). ||
    // if empty, nullrev
    lP2 byte
    p2 [lP2]byte
Line 47: Line 41:
Then, for each filelog, the following:     // if empty, self (for changelogs)
    lLinknode byte
    linknode [lLinknode]byte
Line 49: Line 45:
|| '''Offset''' || '''Size''' || '''Type''' || '''Description''' ||
||<)> 0 ||<)> 4 || uint || Number of filelog entries. ||
||<)> 4 ||<)> 4 || uint || Length of filename, in bytes. ||
||<)> 8 ||<)> d || string || Filename (unterminated). ||
||<)> d + 8 ||<)> e || group || Changroup containing filelog entries. ||
    // if empty, p1
    lDeltaParent byte
    deltaParent [lDeltaParent]byte
}
Line 55: Line 50:
The changegroup format is described below.
Line 57: Line 51:
== Changegroups ==
A changegroup consists of a number of chunks describing revisions. Each chunk has the following format:
We'll modify the existing changegroup type so it can pretend to be a new changegroup that just has a variety of empty fields. Progress information fields might be optional.
Line 60: Line 53:
|| '''Offset''' || '''Size''' || '''Type''' || '''Description''' ||
||<)> 0 ||<)> 4 || uint || Total length of the chunk, including the 104 bytes header described here. ||
||<)> 4 ||<)> 20 || sha-1 hash || Node of this revision. ||
||<)> 24 ||<)> 20 || sha-1 hash || First parent of this revision. ||
||<)> 44 ||<)> 20 || sha-1 hash || Second parent of this revision (or 0-bytes). ||
||<)> 64 ||<)> 20 || sha-1 hash || Link pointer back to the changelog. ||
||<)> 84 ||<)> 20 || sha-1 hash || Parent for the delta (or 0-bytes for a snapshot). ||
||<)> 104 ||<)> f || data || Delta or full version snapshot. ||
Line 69: Line 54:
So in the above table, we always have ''chunk length = f + 104.''

== Further requirement ==

Additional feature have landed into Mercurial since this design. We also wish to support the following data in a bundle

 * light weight copy support (http://bz.selenic.com/show_bug.cgi?id=883)
 * [[Phases]] data
 * ChangesetsObsolescence marker
 * Bookmark updates

Note:

This page is primarily intended for developers of Mercurial.

This page describes the current plan to get a more modern and complete bundle format. (for old content of this page check BundleFormatHG19)

Contents

(current content is copy pasted from 2.9 sprint note)

New bundle format

lightweight new manifest general delta bookmarks phase boundaries obsolete markers >sha1 support pushkey extensible for new features (required and optional) progress information resumable? transaction commit markers?

  • It's possible to envision a format that sends a change, its manifest, and filenodes in each chunk rather than sending all changesets, then all manifests, etc.

capabilities

New header:

type Header struct {

  • length uint32 lNode byte node [lNode]byte // if empty (lP1 ==0) then default to previous node in the stream lP1 byte p1 [lP1]byte // if empty, nullrev lP2 byte p2 [lP2]byte // if empty, self (for changelogs) lLinknode byte linknode [lLinknode]byte // if empty, p1 lDeltaParent byte deltaParent [lDeltaParent]byte

}

We'll modify the existing changegroup type so it can pretend to be a new changegroup that just has a variety of empty fields. Progress information fields might be optional.


CategoryNewFeatures

BundleFormat2 (last edited 2018-02-10 00:05:58 by AviKelman)