Differences between revisions 3 and 4
Revision 3 as of 2005-12-04 21:16:21
Size: 1095
Editor: EricHopper
Comment:
Revision 4 as of 2006-12-10 19:53:49
Size: 1360
Editor: 10
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Some thoughts on possible improvements to Revlog. RevlogNG was introduced with Mercurial 0.9.
Line 3: Line 3:
Things that could use fixing: Deficiencies in original revlog format:
Line 11: Line 11:
The current index layout is: The original index format was:
Line 22: Line 22:
Possible future layout: RevlogNG format:
Line 35: Line 35:
Also, it may be interesting to compress entire spans with a single gzip context, with flushes between revisions. This will possibly result in much better compression. RevlogNG header:

As the offset of the first data chunk is always zero, the first offset is used to indicate revlog version number and flags.

RevlogNG also supports interleaving of index and data. This can greatly reduce storage overhead for smaller revlogs. In this format, the data chunk immediately follows its index entry. The position of the next index entry is calculated by adding the compressed length to the offset.

RevlogNG was introduced with Mercurial 0.9.

Deficiencies in original revlog format:

  • no uncompressed revision size stored
  • SHA1 hash is potentially too weak
  • compression context for deltas is often too small
  • offset range is limited to 4MB
  • some metadata is indicated by escaping in the data

The original index format was:

  • 4 bytes: offset
  • 4 bytes: compressed length
  • 4 bytes: base revision
  • 4 bytes: link revision
  • 20 bytes: nodeid
  • 20 bytes: parent 1 nodeid
  • 20 bytes: parent 2 nodeid
  • 72 bytes total

RevlogNG format:

  • 2 bytes: flags
  • 6 bytes: offset (allows for 256TB of compressed history per file)
  • 4 bytes: compressed length
  • 4 bytes: uncompressed length
  • 4 bytes: base revision
  • 4 bytes: link revision
  • 4 bytes: parent 1 revision
  • 4 bytes: parent 2 revision
  • 32 bytes: nodeid
  • 64 bytes total

RevlogNG header:

As the offset of the first data chunk is always zero, the first offset is used to indicate revlog version number and flags.

RevlogNG also supports interleaving of index and data. This can greatly reduce storage overhead for smaller revlogs. In this format, the data chunk immediately follows its index entry. The position of the next index entry is calculated by adding the compressed length to the offset.


CategoryNewFeatures

RevlogNG (last edited 2020-01-13 04:19:53 by aayjaychan)