Differences between revisions 32 and 34 (spanning 2 versions)
Revision 32 as of 2015-06-12 07:47:48
Size: 6501
Comment:
Revision 34 as of 2018-02-10 00:05:58
Size: 2056
Editor: AviKelman
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#pragma section-numbers 2
<<Include(A:dev)>>
<<Include(A:historic)>>
{{{#!wiki caution
Line 5: Line 3:
<!> This plan have been carried out, check in-code documentation. This information was derived by reverse engineering. Some details may be incomplete. Hopefully someone with intimate familiarity with the code can improve it.}}}
Line 7: Line 5:
= BundleFormat2 =
This page describes the current plan to get a more modern and complete bundle format. (for old content of this page check [[BundleFormatHG19]])
The v2 bundle file format is in practice quite similar to v1 (see BundleFormat), in that it comprises a file header followed by a changegroup, but it differs in a few significant ways.
Line 10: Line 7:
<<TableOfContents>> == Practical differences from v1 bundles ==
 * The file has a more verbose multi-stage ASCII header containing key:value pairs. (more below)
 * Zstandard compression (new default) also supported.
 * Uses version 2 deltagroup headers instead of version 1. (see the spec at [[Topic:internals.changegroups|help internals.changegroups]])
 * Everything after the header is shredded into N-byte chunks after it is assembled (N is a parameter defined in the source code).
Line 12: Line 13:
(current content is copy pasted from 2.9 sprint note) == Reading the header ==
Line 14: Line 15:
== Why a New bundle format? ==
 * lightweight
 * new manifest
 * general delta
 * bookmarks
 * phase boundaries
 * obsolete markers
 * >sha1 support
 * pushkey
 * extensible for new features (required and optional)
 * progress information
 * resumable?
 * transaction commit markers?
 * recursive (to be able to bundle subrepos)
=== stage 1 ===
|| 'HG20' || Compression Chunk || rest of file ||
Line 29: Line 18:
It's possible to envision a format that sends a change, its manifest, and filenodes in each chunk rather than sending all changesets, then all manifests, etc. capabilities Compression Chunk will be either null or contain the ASCII 'Compression=XX' where XX is a code indicating which decompression to use on the rest of the file.
Line 31: Line 20:
== Changes in current command ==
=== Push Orchestraction ===
==== Current situation ====
 * push:
  * changesets:
   * discovery
   * validation
   * actual push
  * phase:
   * discovery
   * pull
   * push
  * obsolescence
   * discovery
   * push
  * bookmark
   * discovery
   * push
=== stage 2 ===
|||| rest of file from stage 1 ||
|| Parameters Chunk || shredded changegroup (and possibly other sections?) ||
Line 50: Line 24:
==== Aimed orchestration ====
* push:
Parameters Chunk contains (among possibly other things?) the fact that the file contains a changegroup ('\x0bCHANGEGROUP'), a null chunk, and then a complex nested sequence of two parameter categories. The nested sequence contains, first, indicators for how many key:value pairs are in the first category, followed by how many pairs are in the second category, followed by the length of an ASCII key, followed by the length of its ASCII value (repeated for all keys and values).
Line 53: Line 26:
 * discovery:
  * changesets
  * phase
  * obs
  * bookmark
 * post-discovery action:
  * current usecase move phase for common changeset seen as public.
 * local-validation:
  * (much easier will everything in hands)
  * complains about:
   * multiple heads
   * new branch
   * troubles changeset
   * divergent bookmark
   * missing subrepo revisions
   * Rent in Manhattan
   * etc…
 * push:
  * (using multipart-bundle when possible)
   . The one and single remote side transaction happen here
 * (post-push) pull:
  * The server send back its own multipart-bundle to the client
   . (The server would be able to reply a multi-bundle. To inform the client of potential phase//bookmark//changeset rewrites etc…)

==== post-push pull ====
If we let the protocol send arbitrary data to the server, we need the server to be able to send back arbitrary data too.

The idea is to use the very same top level format. It could contain any kind of thing the client have advertise to understand. This last phase is advisory this the client can totally decide to ignore its content.

Possible use cases are:

 * sending standard output back
 * sending standard error back
 * notification that a changeset was made public on push
 * notification of partially accepted changeset
 * notification of automatic bookmark move on the server
 * test case result (or test run key)
 * Automatic shipment of Pony to contributor address
 * … (Possibility are endless)

=== Changes in Pull ===
Same kind of stuff will happen but pull is much simpler. (I'm not worried at all about it). May efficiently pull subrepo revisions.

=== Change in Bundle/Unbundle ===
Unbundle would learn to unbundle both

Maybe we can have the new bundle format start with an invalid entry to prevent old unbundle to try to import them

bundle should be able to produce new bundle. It can probably not do it by default for a long time however :-/

We could also do a "recursive bundle" in the presence of subrepos. A bundle could contain parts that are bundles of the subrepo revisions referenced by the revisions contained in the main bundle.

== Top level Bundle ==
=== content ===
On the remote side, the server will need to redo the validation that was done on the remote side to ensure that nothing interesting happened between discovery and push. We need to send appropriate data to the remote for validation. This implies either argument in the command data, or a dedicated section in the bundle. The dedicated section seems the way to go as it feels more flexible. We do not know what kind of data will be monitored and send. So we cannot build a sensible set of argument doing the job. With a dedicated section in the multi-part bundle, we can make this section evolve over time to match the evolution of data we send to the server.

=== forseen sections ===
Here are the idea we already have about section

 * HG10 (old changeset bundle format)
 * HG19 (new changeset bundle with support for modern stuff)
 * pushkey data (phase, bookmarks)
 * obsolescence markers (format 1 and upcoming format 2 ?)
 * client capacity (to be used for the reply multi part bundle)
 * presence of subrepo bundles

== Format of the Bundle2 Container ==
The latest description of the binary format can be found as comment in the Mercurial source code. This is the source of truth.

=== Examples of top level parameter ===
Those are example **not actual proposal of final parameters**. Some of them are actually very clowny.

==== Mandatory options ====
 * Set a new format of part headers:
  . `PARTVERSION=1`

 * Have the payload use a special compression algorithm
  . `COMPRESSION=DOGEZIP`

 * Set encoding of string in part-header to GOST13052 (or EBCDIC if you insist)
  . `PARTENCODING=GOST13052`

 * Set integer format in part-header to middle-endian
  . `ENDIANESS=PDP11`

==== Example advisory options ====

 * ask for debug level output in the reply
  . `debug=1`

 * inform of total number of parts:
  . `nbparts=42`

 * inform of total size of the bundle:
  . `totalsize=1337`

==== Example of -invalid- options ====

 * List of known heads (use a part for that)

 * username and/or credential (use a part for that)

== New type of Part ==
=== Changesets exchange ===
=== New header ===
{{{#!C
type Header struct {
    length uint32
    lNode byte
    node [lNode]byte

    // if empty (lP1 ==0) then default to previous node in the stream
    lP1 byte
    p1 [lP1]byte

    // if empty, nullrev
    lP2 byte
    p2 [lP2]byte

    // if empty, self (for changelogs)
    lLinknode byte
    linknode [lLinknode]byte

    // if empty, p1
    lDeltaParent byte
    deltaParent [lDeltaParent]byte
}
}}}
We'll modify the existing changegroup type so it can pretend to be a new changegroup that just has a variety of empty fields. Progress information fields might be optional.

== Testing bundle2 ==
bundle2 can be enabled by setting the following hgrc option:

{{{
[experimental]
bundle2-exp = True
}}}
----
CategoryNewFeatures CategoryInternals CategoryProposedDeletion
Example Parameters Chunk:
|| chunk length |||| description of contents || #section1 parameters || #section2 parameters || len(key1),len(value1) || len(key2),len(value2) || key1 || value1 || key2 || value2||
|| 4 bytes || \x0bCHANGEGROUP || 4 bytes null || \x01 || \x01 || \x07\x02 || \t\x01 || version || 02 || nbchanges || 7 ||

This information was derived by reverse engineering. Some details may be incomplete. Hopefully someone with intimate familiarity with the code can improve it.

The v2 bundle file format is in practice quite similar to v1 (see BundleFormat), in that it comprises a file header followed by a changegroup, but it differs in a few significant ways.

Practical differences from v1 bundles

  • The file has a more verbose multi-stage ASCII header containing key:value pairs. (more below)
  • Zstandard compression (new default) also supported.
  • Uses version 2 deltagroup headers instead of version 1. (see the spec at help internals.changegroups)

  • Everything after the header is shredded into N-byte chunks after it is assembled (N is a parameter defined in the source code).

Reading the header

stage 1

'HG20'

Compression Chunk

rest of file

Compression Chunk will be either null or contain the ASCII 'Compression=XX' where XX is a code indicating which decompression to use on the rest of the file.

stage 2

rest of file from stage 1

Parameters Chunk

shredded changegroup (and possibly other sections?)

Parameters Chunk contains (among possibly other things?) the fact that the file contains a changegroup ('\x0bCHANGEGROUP'), a null chunk, and then a complex nested sequence of two parameter categories. The nested sequence contains, first, indicators for how many key:value pairs are in the first category, followed by how many pairs are in the second category, followed by the length of an ASCII key, followed by the length of its ASCII value (repeated for all keys and values).

Example Parameters Chunk:

chunk length

description of contents

#section1 parameters

#section2 parameters

len(key1),len(value1)

len(key2),len(value2)

key1

value1

key2

value2

4 bytes

\x0bCHANGEGROUP

4 bytes null

\x01

\x01

\x07\x02

\t\x01

version

02

nbchanges

7

BundleFormat2 (last edited 2018-02-10 00:05:58 by AviKelman)