Differences between revisions 27 and 28

Note:

This page is primarily intended for developers of Mercurial.

BundleFormat2

This page describes the current plan to get a more modern and complete bundle format. (for old content of this page check BundleFormatHG19)

Contents

BundleFormat2

(current content is copy pasted from 2.9 sprint note)

Why a New bundle format?

lightweight
new manifest
general delta
bookmarks
phase boundaries
obsolete markers
>sha1 support
pushkey
extensible for new features (required and optional)
progress information
resumable?
transaction commit markers?
recursive (to be able to bundle subrepos)

It's possible to envision a format that sends a change, its manifest, and filenodes in each chunk rather than sending all changesets, then all manifests, etc. capabilities

Changes in current command

Push Orchestraction

Current situation

push:
- changesets:
  - discovery
  - validation
  - actual push
- phase:
  - discovery
  - pull
  - push
- obsolescence
  - discovery
  - push
- bookmark
  - discovery
  - push

Aimed orchestration

* push:

discovery:
- changesets
- phase
- obs
- bookmark
post-discovery action:
- current usecase move phase for common changeset seen as public.
local-validation:
- (much easier will everything in hands)
- complains about:
  - multiple heads
  - new branch
  - troubles changeset
  - divergent bookmark
  - missing subrepo revisions
  - Rent in Manhattan
  - etc…
push:
- (using multipart-bundle when possible)
  - The one and single remote side transaction happen here
(post-push) pull:
- The server send back its own multipart-bundle to the client
  - (The server would be able to reply a multi-bundle. To inform the client of potential phase//bookmark//changeset rewrites etc…)

post-push pull

If we let the protocol send arbitrary data to the server, we need the server to be able to send back arbitrary data too.

The idea is to use the very same top level format. It could contain any kind of thing the client have advertise to understand. This last phase is advisory this the client can totally decide to ignore its content.

Possible use cases are:

sending standard output back
sending standard error back
notification that a changeset was made public on push
notification of partially accepted changeset
notification of automatic bookmark move on the server
test case result (or test run key)
Automatic shipment of Pony to contributor address
… (Possibility are endless)

Changes in Pull

Same kind of stuff will happen but pull is much simpler. (I'm not worried at all about it). May efficiently pull subrepo revisions.

Change in Bundle/Unbundle

Unbundle would learn to unbundle both

Maybe we can have the new bundle format start with an invalid entry to prevent old unbundle to try to import them

bundle should be able to produce new bundle. It can probably not do it by default for a long time however :-/

We could also do a "recursive bundle" in the presence of subrepos. A bundle could contain parts that are bundles of the subrepo revisions referenced by the revisions contained in the main bundle.

Top level Bundle

content

On the remote side, the server will need to redo the validation that was done on the remote side to ensure that nothing interesting happened between discovery and push. We need to send appropriate data to the remote for validation. This implies either argument in the command data, or a dedicated section in the bundle. The dedicated section seems the way to go as it feels more flexible. We do not know what kind of data will be monitored and send. So we cannot build a sensible set of argument doing the job. With a dedicated section in the multi-part bundle, we can make this section evolve over time to match the evolution of data we send to the server.

forseen sections

Here are the idea we already have about section

HG10 (old changeset bundle format)
HG19 (new changeset bundle with support for modern stuff)
pushkey data (phase, bookmarks)
obsolescence markers (format 1 and upcoming format 2 ?)
client capacity (to be used for the reply multi part bundle)
presence of subrepo bundles

Format of the Bundle2 Container

The latest description of the binary format can be found as comment in the Mercurial source code. This is the source of truth.

Examples of top level parameter

Those are example **not actual proposal of final parameters**. Some of them are actually very clowny.

Mandatory options

Set a new format of part headers:
- PARTVERSION=1
Have the payload use a special compression algorithm
- COMPRESSION=DOGEZIP
Set encoding of string in part-header to GOST13052 (or EBCDIC if you insist)
- PARTENCODING=GOST13052
Set integer format in part-header to middle-endian
- ENDIANESS=PDP11

=== Example advisory options ====

ask for debug level output in the reply
- debug=1
inform of total number of parts:
- nbparts=42
inform of total size of the bundle:
- totalsize=1337

=== Example of -invalid- options ====

List of known heads (use a part for that)
username and/or credential (use a part for that)

New type of Part

Changesets exchange

New header

type Header struct {
    length       uint32
    lNode        byte
    node         [lNode]byte

    // if empty (lP1 ==0) then default to previous node in the stream
    lP1          byte
    p1           [lP1]byte

    // if empty, nullrev
    lP2          byte
    p2           [lP2]byte

    // if empty, self (for changelogs)
    lLinknode    byte
    linknode     [lLinknode]byte

    // if empty, p1
    lDeltaParent byte
    deltaParent  [lDeltaParent]byte
}

We'll modify the existing changegroup type so it can pretend to be a new changegroup that just has a variety of empty fields. Progress information fields might be optional.

Testing bundle2

bundle2 can be enabled by setting the following hgrc option:

[experimental]
bundle2-exp = True

CategoryNewFeatures

-  ⇤ ← Revision 27 as of 2014-12-31 01:33:29 → 
  Size: 6339
  Editor: Pierre-YvesDavid
  Comment: drop most of the binary description format as it is already contained in the source code.
+   ← Revision 28 as of 2015-01-19 13:16:31 → ⇥
  Size: 6334
  Editor: rom1dep
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 4:
-Line 12:
+Line 11:
-Line 27:
+Line 25:
-It's possible to envision a format that sends a change, its manifest, and
filenodes in each chunk rather than sending all changesets, then all manifests,
etc.  capabilities
+It's possible to envision a format that sends a change, its manifest, and filenodes in each chunk rather than sending all changesets, then all manifests, etc.  capabilities
-Line 32:
+Line 28:
-Line 34:
+Line 29:
-Line 36:
+Line 30:
-Line 38:
+Line 31:
-   * changesets:
     * discovery
     * validation
     * actual push
   * phase:
     * discovery
     * pull
     * push
   * obsolescence
     * discovery
     * push
   * bookmark
     * discovery
     * push
+  * changesets:
   * discovery
   * validation
   * actual push
  * phase:
   * discovery
   * pull
   * push
  * obsolescence
   * discovery
   * push
  * bookmark
   * discovery
   * push
-Line 54:
+Line 47:
+* push:
-Line 55:
+Line 49:
-* push:
  * discovery:
    * changesets
    * phase
    * obs
    * bookmark
  * post-discovery action:
    * current usecase move phase for common changeset seen as public.
  * local-validation:
    * (much easier will everything in hands)
    * complains about:
      * multiple heads
      * new branch
      * troubles changeset
      * divergent bookmark
      * missing subrepo revisions
      * Rent in Manhattan
      * etc…
  * push:
      * (using multipart-bundle when possible)
        The one and single remote side transaction happen here
  * (post-push) pull:
      * The server send back its own multipart-bundle to the client
        (The server would be able to reply a multi-bundle. To inform the client of potential phase//bookmark//changeset rewrites etc…)
+ * discovery:
  * changesets
  * phase
  * obs
  * bookmark
 * post-discovery action:
  * current usecase move phase for common changeset seen as public.
 * local-validation:
  * (much easier will everything in hands)
  * complains about:
   * multiple heads
   * new branch
   * troubles changeset
   * divergent bookmark
   * missing subrepo revisions
   * Rent in Manhattan
   * etc…
 * push:
  * (using multipart-bundle when possible)
   . The one and single remote side transaction happen here
 * (post-push) pull:
  * The server send back its own multipart-bundle to the client
   . (The server would be able to reply a multi-bundle. To inform the client of potential phase//bookmark//changeset rewrites etc…)
-Line 81:
+Line 74:
-Line 98:
+Line 90:
-Same kind of stuff will happen but pull is much simpler. (I'm not worried at all about it).
May efficiently pull subrepo revisions.
+Same kind of stuff will happen but pull is much simpler. (I'm not worried at all about it). May efficiently pull subrepo revisions.
-Line 103:
+Line 93:
-Line 112:
+Line 101:
-Line 114:
+Line 102:
-Line 116:
+Line 103:
-On the remote side, the server will need to redo the validation that was done on
the remote side to ensure that nothing interesting happened between discovery
and push. We need to send appropriate data to the remote for validation. This
implies either argument in the command data, or a dedicated section in the
bundle. The dedicated section seems the way to go as it feels more flexible. We
do not know what kind of data will be monitored and send. So we cannot build a
sensible set of argument doing the job. With a dedicated section in the
multi-part bundle, we can make this section evolve over time to match the
evolution of data we send to the server.
+On the remote side, the server will need to redo the validation that was done on the remote side to ensure that nothing interesting happened between discovery and push. We need to send appropriate data to the remote for validation. This implies either argument in the command data, or a dedicated section in the bundle. The dedicated section seems the way to go as it feels more flexible. We do not know what kind of data will be monitored and send. So we cannot build a sensible set of argument doing the job. With a dedicated section in the multi-part bundle, we can make this section evolve over time to match the evolution of data we send to the server.
-Line 128:
+Line 106:
-Line 139:
+Line 116:
-Line 143:
+Line 119:
-Those are example **not actual proposal of final parameters**. Some of them are
actually very clowny.
+Those are example **not actual proposal of final parameters**. Some of them are actually very clowny.
-Line 148:
+Line 122:
-Line 150:
+Line 123:
-   `PARTVERSION=1`
+  . `PARTVERSION=1`
-Line 154:
+Line 126:
-   `COMPRESSION=DOGEZIP`
+  . `COMPRESSION=DOGEZIP`
-Line 158:
+Line 129:
-   `PARTENCODING=GOST13052`
+  . `PARTENCODING=GOST13052`
-Line 162:
+Line 132:
-   `ENDIANESS=PDP11`
+  . `ENDIANESS=PDP11`
-Line 168:
+Line 137:
-   `debug=1`
+  . `debug=1`
-Line 172:
+Line 140:
-   `nbparts=42`
+  . `nbparts=42`
-Line 176:
+Line 143:
-   `totalsize=1337`
+  . `totalsize=1337`
-Line 186:
+Line 152:
-Line 188:
+Line 153:
-Line 190:
+Line 154:
-Line 211:
+Line 174:
     deltaParent  [lDeltaParent]byte
-Line 216:
+Line 179:
+== Testing bundle2 ==
bundle2 can be enabled by setting the following hgrc option:
-Line 217:
+Line 182:
+{{{
[experimental]
bundle2-exp = True
}}}

Diff for "BundleFormat2"