D4412: internals: document CBOR utilization

indygreg (Gregory Szorc) phabricator at mercurial-scm.org
Mon Sep 3 13:38:15 UTC 2018


This revision was automatically updated to reflect the committed changes.
Closed by commit rHG2fe21c65777e: internals: document CBOR utilization (authored by indygreg, committed by ).

CHANGED PRIOR TO COMMIT
  https://phab.mercurial-scm.org/D4412?vs=10624&id=10726#toc

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D4412?vs=10624&id=10726

REVISION DETAIL
  https://phab.mercurial-scm.org/D4412

AFFECTED FILES
  contrib/wix/help.wxs
  mercurial/help.py
  mercurial/help/internals/cbor.txt
  tests/test-help.t

CHANGE DETAILS

diff --git a/tests/test-help.t b/tests/test-help.t
--- a/tests/test-help.t
+++ b/tests/test-help.t
@@ -1010,6 +1010,7 @@
   
        bundle2       Bundle2
        bundles       Bundles
+       cbor          CBOR
        censor        Censor
        changegroups  Changegroups
        config        Config Registrar
@@ -3294,6 +3295,13 @@
   Bundles
   </td></tr>
   <tr><td>
+  <a href="/help/internals.cbor">
+  cbor
+  </a>
+  </td><td>
+  CBOR
+  </td></tr>
+  <tr><td>
   <a href="/help/internals.censor">
   censor
   </a>
diff --git a/mercurial/help/internals/cbor.txt b/mercurial/help/internals/cbor.txt
new file mode 100644
--- /dev/null
+++ b/mercurial/help/internals/cbor.txt
@@ -0,0 +1,130 @@
+Mercurial uses Concise Binary Object Representation (CBOR)
+(RFC 7049) for various data formats.
+
+This document describes the subset of CBOR that Mercurial uses and
+gives recommendations for appropriate use of CBOR within Mercurial.
+
+Type Limitations
+================
+
+Major types 0 and 1 (unsigned integers and negative integers) MUST be
+fully supported.
+
+Major type 2 (byte strings) MUST be fully supported. However, there
+are limitations around the use of indefinite-length byte strings.
+(See below.)
+
+Major type 3 (text strings) are NOT supported.
+
+Major type 4 (arrays) MUST be supported. However, values are limited
+to the set of types described in the "Container Types" section below.
+And indefinite-length arrays are NOT supported.
+
+Major type 5 (maps) MUST be supported. However, key values are limited
+to the set of types described in the "Container Types" section below.
+And indefinite-length maps are NOT supported.
+
+Major type 6 (semantic tagging of major types) can be used with the
+following semantic tag values:
+
+258
+   Mathematical finite set. Suitable for representing Python's
+   ``set`` type.
+
+All other semantic tag values are not allowed.
+
+Major type 7 (simple data types) can be used with the following
+type values:
+
+20
+   False
+21
+   True
+22
+   Null
+31
+   Break stop code (for indefinite-length items).
+
+All other simple data type values (including every value requiring the
+1 byte extension) are disallowed.
+
+Indefinite-Length Byte Strings
+==============================
+
+Indefinite-length byte strings (major type 2) are allowed. However,
+they MUST NOT occur inside a container type (such as an array or map).
+i.e. they can only occur as the "top-most" element in a stream of
+values.
+
+Encoders and decoders SHOULD *stream* indefinite-length byte strings.
+i.e. an encoder or decoder SHOULD NOT buffer the entirety of a long
+byte string value when indefinite-length byte strings are being used
+if it can be avoided. Mercurial MAY use extremely long indefinite-length
+byte strings and buffering the source or destination value COULD lead to
+memory exhaustion.
+
+Chunks in an indefinite-length byte string SHOULD NOT exceed 2^20
+bytes.
+
+Container Types
+===============
+
+Mercurial may use the array (major type 4), map (major type 5), and
+set (semantic tag 258 plus major type 4 array) container types.
+
+An array may contain any supported type as values.
+
+A map MUST only use the following types as keys:
+
+* unsigned integers (major type 0)
+* negative integers (major type 1)
+* byte strings (major type 2) (but not indefinite-length byte strings)
+* false (simple type 20)
+* true (simple type 21)
+* null (simple type 22)
+
+A map MUST only use the following types as values:
+
+* all types supported as map keys
+* arrays
+* maps
+* sets
+
+A set may only use the following types as values:
+
+* all types supported as map keys
+
+It is recommended that keys in maps and values in sets and arrays all
+be of a uniform type.
+
+Avoiding Large Byte Strings
+===========================
+
+The use of large byte strings is discouraged, especially in scenarios where
+the total size of the byte string may by unbound for some inputs (e.g. when
+representing the content of a tracked file). It is highly recommended to use
+indefinite-length byte strings for these purposes.
+
+Since indefinite-length byte strings cannot be nested within an outer
+container (such as an array or map), to associate a large byte string
+with another data structure, it is recommended to use an array or
+map followed immediately by an indefinite-length byte string. For example,
+instead of the following map::
+
+   {
+      "key1": "value1",
+      "key2": "value2",
+      "long_value": "some very large value...",
+   }
+
+Use a map followed by a byte string:
+
+   {
+      "key1": "value1",
+      "key2": "value2",
+      "value_follows": True,
+   }
+   <BEGIN INDEFINITE-LENGTH BYTE STRING>
+   "some very large value"
+   "..."
+   <END INDEFINITE-LENGTH BYTE STRING>
diff --git a/mercurial/help.py b/mercurial/help.py
--- a/mercurial/help.py
+++ b/mercurial/help.py
@@ -205,6 +205,8 @@
      loaddoc('bundle2', subdir='internals')),
     (['bundles'], _('Bundles'),
      loaddoc('bundles', subdir='internals')),
+    (['cbor'], _('CBOR'),
+     loaddoc('cbor', subdir='internals')),
     (['censor'], _('Censor'),
      loaddoc('censor', subdir='internals')),
     (['changegroups'], _('Changegroups'),
diff --git a/contrib/wix/help.wxs b/contrib/wix/help.wxs
--- a/contrib/wix/help.wxs
+++ b/contrib/wix/help.wxs
@@ -43,6 +43,7 @@
           <Component Id="help.internals" Guid="$(var.help.internals.guid)" Win64='$(var.IsX64)'>
             <File Id="internals.bundle2.txt"      Name="bundle2.txt" />
             <File Id="internals.bundles.txt"      Name="bundles.txt" KeyPath="yes" />
+            <File Id="internals.cbor.txt"         Name="cbor.txt" />
             <File Id="internals.censor.txt"       Name="censor.txt" />
             <File Id="internals.changegroups.txt" Name="changegroups.txt" />
             <File Id="internals.config.txt"       Name="config.txt" />



To: indygreg, #hg-reviewers
Cc: mercurial-devel


More information about the Mercurial-devel mailing list