[PATCH 1 of 3] help: clarify revision / chunk behavior
quark at fb.com
Thu Mar 2 17:19:25 EST 2017
Excerpts from Gregory Szorc's message of 2017-02-27 12:54:00 -0800:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1488226671 28800
> # Mon Feb 27 12:17:51 2017 -0800
> # Node ID ded4aedfaffbabce6c083f660fc5feeeeb287f0c
> # Parent abb92b3d370e116b29eba4d2e3154e9691c8edbb
> help: clarify revision / chunk behavior
> Try to make it easier to understand the differences between the logical
> and physical model of revlog storage.
> diff --git a/mercurial/help/internals/revlogs.txt b/mercurial/help/internals/revlogs.txt
> --- a/mercurial/help/internals/revlogs.txt
> +++ b/mercurial/help/internals/revlogs.txt
> @@ -2,17 +2,18 @@ Revision logs - or *revlogs* - are an ap
> storing discrete entries, or *revisions*. They are the primary storage
> mechanism of repository data.
> +A revlog revision logically consists of 2 parts: metadata and a content
"revision" is undefined to a new person reading here. How about moving it to
the paragraph below, and replacing it with "node" (or, make it clear that a
"node" is a "revision") ?
> +blob. Metadata includes the hash of the revision's content, sizes, and
> +links to its *parent* entries. The collective metadata is referred
> +to as the *index* and the revision content is the *data*.
> Revlogs effectively model a directed acyclic graph (DAG). Each node
> has edges to 1 or 2 *parent* nodes. Each node contains metadata and
> the raw value for that node.
> -Revlogs consist of entries which have metadata and revision data.
> -Metadata includes the hash of the revision's content, sizes, and
> -links to its *parent* entries. The collective metadata is referred
> -to as the *index* and the revision data is the *data*.
Actually I think the old version is good enough and in a better order -
first introduce the DAG concept, then explain details.
> -Revision data is stored as a series of compressed deltas against previous
I'd keep the above sentence - it's concise and does not hurt.
> +The revision data physically stored in a revlog entry is referred to as
"entry" vs "revision" vs "node" could confuse new people.
> +a *chunk*. A *chunk* is either the raw fulltext of a revision or a delta
> +against a previous fulltext. In both cases, a *chunk* may be compressed.
I'd say "against another revision". "previous" may imply rev-1. "fulltext"
may imply that delta base cannot be a delta.
> Revlogs are written in an append-only fashion. We never need to rewrite
> a file to insert nor do we need to remove data. Rolling back in-progress
> @@ -87,7 +88,7 @@ 0-3 (4 bytes) (rev 0 only)
> Revlog header
> 0-5 (6 bytes)
> - Absolute offset of revision data from beginning of revlog.
> + Absolute offset of revision chunk from beginning of revlog.
> 6-7 (2 bytes)
> Bit flags impacting revision behavior. The following bit offsets define:
> @@ -100,15 +101,15 @@ 6-7 (2 bytes)
> 2: REVIDX_EXTSTORED revision data is stored externally.
> 8-11 (4 bytes)
> - Compressed length of revision data / chunk as stored in revlog.
> + Compressed length of revision chunk as stored in revlog.
> 12-15 (4 bytes)
> Uncompressed length of revision data. This is the size of the full
> - revision data, not the size of the chunk post decompression.
> + revision data (as opposed to the delta/chunk).
> 16-19 (4 bytes)
> Base or previous revision this revision's delta was produced against.
> - -1 means this revision holds full text (as opposed to a delta).
> + -1 means this chunk holds full text (as opposed to a delta).
> For generaldelta repos, this is the previous revision in the delta
> chain. For non-generaldelta repos, this is the base or first
> revision in the delta chain.
> @@ -185,16 +186,16 @@ The actual layout of revlog files on dis
> *store format*. Typically, a ``.i`` file represents the index revlog
> (possibly containing inline data) and a ``.d`` file holds the revision data.
> -Revision Entries
> +Revision Chunks
> -Revision entries consist of an optional 1 byte header followed by an
> -encoding of the revision data. The headers are as follows:
> +Chunks in revision entries consist of an optional 1 byte header followed
> +by an encoding of the chunk data. The headers are as follows:
> \0 (0x00)
> - Revision data is the entirety of the entry, including this header.
> + Chunk data is the entirety of the entry, including this header.
> u (0x75)
> - Raw revision data follows.
> + Raw chunk data follows.
> x (0x78)
> zlib (RFC 1950) data.
These changes look good to me.
More information about the Mercurial-devel