[PATCH 1 of 3] help: clarify revision / chunk behavior

Gregory Szorc gregory.szorc at gmail.com
Mon Feb 27 20:54:00 UTC 2017


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1488226671 28800
#      Mon Feb 27 12:17:51 2017 -0800
# Node ID ded4aedfaffbabce6c083f660fc5feeeeb287f0c
# Parent  abb92b3d370e116b29eba4d2e3154e9691c8edbb
help: clarify revision / chunk behavior

Try to make it easier to understand the differences between the logical
and physical model of revlog storage.

diff --git a/mercurial/help/internals/revlogs.txt b/mercurial/help/internals/revlogs.txt
--- a/mercurial/help/internals/revlogs.txt
+++ b/mercurial/help/internals/revlogs.txt
@@ -2,17 +2,18 @@ Revision logs - or *revlogs* - are an ap
 storing discrete entries, or *revisions*. They are the primary storage
 mechanism of repository data.
 
+A revlog revision logically consists of 2 parts: metadata and a content
+blob. Metadata includes the hash of the revision's content, sizes, and
+links to its *parent* entries. The collective metadata is referred
+to as the *index* and the revision content is the *data*.
+
 Revlogs effectively model a directed acyclic graph (DAG). Each node
 has edges to 1 or 2 *parent* nodes. Each node contains metadata and
 the raw value for that node.
 
-Revlogs consist of entries which have metadata and revision data.
-Metadata includes the hash of the revision's content, sizes, and
-links to its *parent* entries. The collective metadata is referred
-to as the *index* and the revision data is the *data*.
-
-Revision data is stored as a series of compressed deltas against previous
-revisions.
+The revision data physically stored in a revlog entry is referred to as
+a *chunk*. A *chunk* is either the raw fulltext of a revision or a delta
+against a previous fulltext. In both cases, a *chunk* may be compressed.
 
 Revlogs are written in an append-only fashion. We never need to rewrite
 a file to insert nor do we need to remove data. Rolling back in-progress
@@ -87,7 +88,7 @@ 0-3 (4 bytes) (rev 0 only)
    Revlog header
 
 0-5 (6 bytes)
-   Absolute offset of revision data from beginning of revlog.
+   Absolute offset of revision chunk from beginning of revlog.
 
 6-7 (2 bytes)
    Bit flags impacting revision behavior. The following bit offsets define:
@@ -100,15 +101,15 @@ 6-7 (2 bytes)
    2: REVIDX_EXTSTORED revision data is stored externally.
 
 8-11 (4 bytes)
-   Compressed length of revision data / chunk as stored in revlog.
+   Compressed length of revision chunk as stored in revlog.
 
 12-15 (4 bytes)
    Uncompressed length of revision data. This is the size of the full
-   revision data, not the size of the chunk post decompression.
+   revision data (as opposed to the delta/chunk).
 
 16-19 (4 bytes)
    Base or previous revision this revision's delta was produced against.
-   -1 means this revision holds full text (as opposed to a delta).
+   -1 means this chunk holds full text (as opposed to a delta).
    For generaldelta repos, this is the previous revision in the delta
    chain. For non-generaldelta repos, this is the base or first
    revision in the delta chain.
@@ -185,16 +186,16 @@ The actual layout of revlog files on dis
 *store format*. Typically, a ``.i`` file represents the index revlog
 (possibly containing inline data) and a ``.d`` file holds the revision data.
 
-Revision Entries
-================
+Revision Chunks
+===============
 
-Revision entries consist of an optional 1 byte header followed by an
-encoding of the revision data. The headers are as follows:
+Chunks in revision entries consist of an optional 1 byte header followed
+by an encoding of the chunk data. The headers are as follows:
 
 \0 (0x00)
-   Revision data is the entirety of the entry, including this header.
+   Chunk data is the entirety of the entry, including this header.
 u (0x75)
-   Raw revision data follows.
+   Raw chunk data follows.
 x (0x78)
    zlib (RFC 1950) data.
 


More information about the Mercurial-devel mailing list