Differences between revisions 8 and 9

Mercurial manifest sharding

Problem statement: imagine you have 1m to 1b files.

Individual manifest RAM overhead is a problem somewhere in this range.

Checkout: we don't want to materialize the working copy on the machine, and we don't want the whole manifest on the local machine.

Limitations of large and linear manifests:

manifest too large for RAM
manifest resolution too much CPU (long delta chains)
committing is slow because entire manifest has to be hashed
impossible for narrow clone to leave out part of manifest as all is needed to calculate new hash
diffing two revisions involves traversing entire subdirectories even if identical

A sample repository with 1M files is hosted on Google Drive (441MB).

1. Proposed Solution

Every directory will have its own manifest revlog. Each directory would thus have its own nodeid. This addresses the issues above:

entire manifest (including sub-manifests) usually does not need to be loaded
delta chains will be shorter since (sub-)manifests are smaller
since directories have nodeids, directories not touched by the commit don't need to hashed
narrow clone will be able to skip unwanted directories and files and will still be able to calculate a new hash
diffing becomes faster since directory nodeids are stored and entire directories can be skipped

Some further benefits:

space savings by only storing the filename in the manifest (but also see Costs below)
'hg log path/to/dir' can be made faster by walking the directory's revlog.

Costs:

more revlogs need to be stored, using more space (+7% number of revlogs for the Mozilla repo, +20%/80k for a Facebook repo)
more revlogs means more disk seeks
when a single file many levels down in a directory is changed, many revlogs need to be written

2. Current Plan

durin42 to follow these steps (NEEDS UPDATING):

Make manifest parsing lazy (similar to parse_index2)
- - this will make manifest lookups be O(log(manifest size)) and iteration of the manifest in lexicographic order be O(n) instead of requiring a sort.
Add a manifest class that stores the directory nodes and uses the alternate (tree-state) hash for nodes
Hack together some experimental narrow functionality in order to see how things work with push/pull over the network

3. Alternatives considered

3.1. Sub-manifests at custom positions in tree

A user splits a shard out using a command like hg debugmarkshard foo/bar/baz, which is then stored as a sub-manifest in a different revlog.

Challenges:

someone has to manage the shards
merging shard boundaries has to happen

Objections:

shard boundaries get encoded in the revision history
- durin42 and durham find this displeasing
- it also means users can't have a manifest that's exactly as narrow as their client

3.2. Tree-state hash, but flat manifest

Make manifest hash something clients with only a partial checkout can do is to do a per-directory hash that bubbles up, and store entries for those directory nodes in their parent with a d in the flags entry. We considered using a hash of filename and hash mod == 0 do a shard, but decided that was probably going to lead to lots of churn, and also bakes the sharding scheme into the manifest hash (which might be suboptimal).

Will require client support to do pull of sharded manifests - that's the second step.

Challenges:

means actually breaking out a shard is expensive, as you have to split/join for network traffic
pushing means the server has to produce a matching-spec narrow manifest to apply the delta
- durham proposes we could store the number of bytes the client had elided in the manifest, which would allow us to produce the same delta as though we were operating on the full manifest, and apply full-manifest deltas even when they contained bits we didn't care about.
need out-of-manifest management of some sort?
doesn't help reduce length of delta chains
doesn't help with working with manifests too large for RAM

Currently hg sparse --include mobile/

doesn't matter if the repo has other stuff, you only get the mobile directory.

hg sparse --enable-profile mobile

profiles live in repo. .hgsparse

proposal: have team specific .hgsparse files in directories. Allows changes without contention. (hg sparse --enable-profile mobile[/.hgsparse])

future magic: hg clone --sparse mobile (to avoid initial full clone)

merges get a little complicated using regexps matching now. proposed to use directories for includes, but allow regex/glob for exclude (to allow not writing certain types of files, like photoshop files)

5. narrow changelog

See NarrowClonePlan

-  ⇤ ← Revision 8 as of 2015-02-13 19:26:30 → 
  Size: 4638
  Editor: MartinVonZweigbergk
  Comment:
+   ← Revision 9 as of 2015-02-17 23:50:20 → ⇥
  Size: 4895
  Editor: MartinVonZweigbergk
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 16:
-Two possible paths forward: explicit shard boundaries or doing a tree-state hash that can elide uninteresting-to-a-client subdirectories.
+A sample repository with 1M files is hosted on [[https://drive.google.com/file/d/0BwpWaFl5KfC_UEZQNzF0eE52R2c/view|Google Drive (441MB)]].
 Line 18:
-A sample repository with 1M files is hosted on [[https://drive.google.com/file/d/0BwpWaFl5KfC_UEZQNzF0eE52R2c/view|Google Drive (441MB)]].
=== Tree-state hash ===
+== Proposed Solution ==
-Line 21:
+Line 20:
-Current plan to make manifest hash something clients with only a partial checkout can do is to do a per-directory hash that bubbles up, and store entries for those directory nodes in their parent with a d in the flags entry. We considered using a hash of filename and hash mod == 0 do a shard, but decided that was probably going to lead to lots of churn, and also bakes the sharding scheme into the manifest hash (which might be suboptimal).
+Every directory will have its own manifest revlog. Each directory would thus have its own nodeid. This addresses the issues above:
-Line 23:
+Line 22:
-Will require client support to do pull of sharded manifests - that's the second step.
+ * entire manifest (including sub-manifests) usually does not need to be loaded
 * delta chains will be shorter since (sub-)manifests are smaller
 * since directories have nodeids, directories not touched by the commit don't need to hashed
 * narrow clone will be able to skip unwanted directories and files and will still be able to calculate a new hash
 * diffing becomes faster since directory nodeids are stored and entire directories can be skipped
-Line 25:
+Line 28:
-Challenges:
 * means actually breaking out a shard is expensive, as you have to split/join for network traffic
 * pushing means the server has to produce a matching-spec narrow manifest to apply the delta
   * durham proposes we could store the number of bytes the client had elided in the manifest, which would allow us to produce the same delta as though we were operating on the full manifest, and apply full-manifest deltas even when they contained bits we didn't care about. 
 * need out-of-manifest management of some sort?
+Some further benefits:
-Line 31:
+Line 30:
-=== Explicit shards ===
+ * space savings by only storing the filename in the manifest (but also see Costs below)
 * 'hg log path/to/dir' can be made faster by walking the directory's revlog.

Costs:

 * more revlogs need to be stored, using more space (+7% number of revlogs for the Mozilla repo, +20%/80k for a Facebook repo)
 * more revlogs means more disk seeks
 * when a single file many levels down in a directory is changed, many revlogs need to be written

== Current Plan ==

durin42 to follow these steps (NEEDS UPDATING):
 1. Make manifest parsing lazy (similar to parse_index2) 
  - this will make manifest lookups be O(log(manifest size)) and iteration of the manifest in lexicographic order be O(n) instead of requiring a sort.
 1. Add a manifest class that stores the directory nodes and uses the alternate (tree-state) hash for nodes
 1. Hack together some experimental narrow functionality in order to see how things work with push/pull over the network

== Alternatives considered ==

=== Sub-manifests at custom positions in tree ===
-Line 44:
+Line 63:
-durin42 thinks that the normal case for a user with a giant repo is that their manifest will always be of a reasonable size since they have to fit all those files on disk.
+=== Tree-state hash, but flat manifest ===
-Line 46:
+Line 65:
-== Current Plan ==
durin42 to follow these steps:
 1. Make manifest parsing lazy (similar to parse_index2) 
  - this will make manifest lookups be O(log(manifest size)) and iteration of the manifest in lexicographic order be O(n) instead of requiring a sort.
 1. Add a manifest class that stores the directory nodes and uses the alternate (tree-state) hash for nodes
 1. Hack together some experimental narrow functionality in order to see how things work with push/pull over the network
+Make manifest hash something clients with only a partial checkout can do is to do a per-directory hash that bubbles up, and store entries for those directory nodes in their parent with a d in the flags entry. We considered using a hash of filename and hash mod == 0 do a shard, but decided that was probably going to lead to lots of churn, and also bakes the sharding scheme into the manifest hash (which might be suboptimal).
-Line 53:
+Line 67:
-This doesn't open the door to an actually-sharded manifest, but it may not matter.
+Will require client support to do pull of sharded manifests - that's the second step.
-Line 55:
+Line 69:
-== Discussion (from titanpad) ==
Directory recursive hashes:
We could compute the hash for submanifests by iterating recursively over the directories in the sub manifest content. This would produce a hash that is unique to the commit directory structure, and agnostic to how the manifest is sharded. (similar to git tree hash calculations).
Pros:
 *  allows changing the manifest format in the future without changing the hashes
 * allows delivering customly sharded manifests to users on demand
Cons:
 * more expensive hash algorithm
 * will require a new manifest version flag (since it won't be backwards comptabile)
+Challenges:
 * means actually breaking out a shard is expensive, as you have to split/join for network traffic
 * pushing means the server has to produce a matching-spec narrow manifest to apply the delta
   * durham proposes we could store the number of bytes the client had elided in the manifest, which would allow us to produce the same delta as though we were operating on the full manifest, and apply full-manifest deltas even when they contained bits we didn't care about. 
 * need out-of-manifest management of some sort?
 * doesn't help reduce length of delta chains
 * doesn't help with working with manifests too large for RAM

Diff for "TreeManifestPlan"