Differences between revisions 4 and 5

Note:

This page is primarily intended for developers of Mercurial.

Performance Improvement

Status: Multi Part Project

Main proponents: Pierre-YvesDavid, GregorySzorc

$/!\$ This is a speculative project and does not represent any firm decisions on future behavior.

The goal of this page is to gather data about known performance bottleneck and ideas about how to solve them

1. Goal

Make Mercurial faster on repositories.

This might be about repository of any size even if most of the work is usually done for large repositories. The larger repositories can be affected by pathological cases and usually get more funding toward improving performance.

Performance on a various reference repository are tracked here: http://perf.octobus.net/

2. Performance Areas

2.1. Local Access

2.1.1. Status

Getting the status of file an important important operation. We can distinct piece where performance matters:

Storing the expected state of files (dirstate). The format currently used is a flat file that has to be fully rewritten on update. Facebook wrote a tree based format for lighter update the storage is also meant to be mmap friendly (see MMapPlan)

In addition, a Rust rewrite of the status can yield impression performance.

2.1.2. Nodemap

Many operation needs to validate that some node are in the repository (loading bookmark, discovery, revsets, etc…). Building this mapping for each invocation of Mercurial gets slow for repository with millions of commit.

Get a persistent {node → rev} mapping

2.1.3. Branchmap

Computing the branch map from scratch can be very expensibe. Instead we use a cache. However this cache can en up being quite expensive too.

it be outdated or stalled
making sure it is valid and its content is valid can be pretty expensive (see the NodeMap section right abose this one)
It growth to be as large and the number of head in the repository. And it grow even for closed branches.
It affect all repository even the one who do not use named branch (or very few of them).
each update requires the full file to be rewrittend.

So there are multiple things we could improve

removing the need for expensive validation by moving this in an index space (see ComputedIndexPlan)
making update of the data in memory cheaper by using topological head information
making the update of the file on disk incremental
make mmap an option to access the data (see MMapPlan).
Having a more compact representation (leveraging the topological head information).

2.1.4. Manifest Access

Many operation requires manifest access. This is important for this computation to be fast. Some work have been done in this area:

sparse-revlog fixes pathological delta chain.
an handful of recently used manifest are not cached
tree manifest exist, but is still experimental

2.1.5. Copy Tracing

In some situation the using the copies information in a repository can be very slow. A new algorithm based on precomputed information is one the work.

2.2. Exchange

2.2.1. Discovery

The discovery is an important step of any push and pull operation. This is especially a probleme for repository with many heads (thousands). The known pathological case has been solved. However, we could reduce the base time further with more work.

Algorithm reducing the number of round-trip,
Rust implementation of the graph related computation,
persistant nodemap cost to reduce server side query cost
Using an abstraction to avoid exchanging the nodes of all heads.

2.2.2. Server-side Changegroup Performance

Servers tend to spend a lot of CPU and bandwidth computing and transferring changegroup data.

2.2.2.1. Caching bundles

The most effective way to alleviate this resource usage is by serving static, pre-generated changegroup data instead of dynamically generating it at request time. A server-side cache of changegroup data would fall into this bucket. The "clone bundles" feature which serves initial clones from URLs is one implementation of this. But it only addresses the initial clone case. Subsequent pulls still result in significant load on the server. There is support for a "remote changegroup" bundle2 part that allows servers to advertise the URL of a pre-generated changegroup. There is a prototype for an extension using this.

Manually generated clone bundle
Manually generated pull bundle
Automatically generated clone/pull bundle (prototype exists)
functionnaly pregenerated bundle for stream clone

2.2.3. Delta computation

sparse-revlog improve the delta chain and their reusability
reduced the number of files to investigate from merge
more delta could be reused avoiding computation both server and client side.

2.2.4. compression

There is plenty of potential to optimize the server for changegroup generation. As of Mercurial 4.0, changegroups (with exception of the changelog) are effectively collections of single delta chains per revlogs. For generaldelta repos, many deltas on disk are reused. However, the server still needs to decompress the revlog entries on disk to obtain the raw deltas then recompress them as part of the changegroup compression context. Furthermore, if there are multiple delta chains in the revlog, the server will need to compute a new delta for those entries. This contributes to overhead, especially the decompression and recompression. Switching away from zlib for both revlog storage and wire protocol compression will help tremendously, as zstd can be 2x more efficient in both decompression and compression.

zstd storage is now officially supported
zstd is not the default.

2.2.5. Skipping the decompression/compression stage

While server efficiency could be increased by increasing the efficiency of compression, it would be better to avoid compression altogether. There exists an "streaming clone" feature that essentially does a file copy of revlogs from server to client. However, this only applies to initial clone. It should be possible to extend this feature to subsequent pulls. So instead of transferring a changegroup with on-the-fly computed delta chains, the server would transfer the raw data in its revlogs, including compression. This feature would not be suitable for all environments, as the transfer size would likely increase and clients would need to support and effectively inherit the settings of the server. However, it would substantially reduce server-side CPU requirements.

support for transparent sending compressed deltas directly from disk.

3. Roadmap

various thing that has been in the air.

skipping useless buffering,
zstd for diff,
alternative backend

4. See Also

DebuggingPerformance

CategoryDeveloper CategoryNewFeatures

-  ⇤ ← Revision 4 as of 2017-06-26 11:54:41 → 
  Size: 3344
  Editor: Pierre-YvesDavid
  Comment:
+   ← Revision 5 as of 2019-12-17 00:51:49 → ⇥
  Size: 7128
  Editor: Pierre-YvesDavid
  Comment: Raw, but lareg update of the page.
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
-= Performance Improvement Plan =
+= Performance Improvement =
 Line 7:
-'''Status: In progress'''
+'''Status: Multi Part Project'''
 Line 17:
-(I'm creating this page with clone/push/pull in mind)
XXX fill me more
+Make Mercurial faster on repositories.
-Line 20:
+Line 19:
-== Detailed description ==
+This might be about repository of any size even if most of the work is usually done for large repositories. The larger repositories can be affected by pathological cases and usually get more funding toward improving performance.
-Line 22:
+Line 21:
-All kind of stuff can go here, solution description / alternative solution etc
+Performance on a various reference repository are tracked here: http://perf.octobus.net/
-Line 24:
+Line 23:
-=== Server-side Changegroup Performance ===
+== Performance Areas ==

=== Local Access ===

==== Status ====

Getting the status of file an important important operation. We can distinct piece where performance matters:

Storing the expected state of files (`dirstate`). The format currently used is a flat file that has to be fully rewritten on update. Facebook wrote [[https://bitbucket.org/facebook/hg-experimental/src/default/hgext3rd/treedirstate.py|a tree based format for lighter update]] the storage is also meant to be mmap friendly (see [[MMapPlan]])

In addition, a Rust rewrite of the status can yield impression performance.

==== Nodemap ====

Many operation needs to validate that some node are in the repository (loading bookmark, discovery, revsets, etc…). Building this mapping for each invocation of Mercurial gets slow for repository with millions of commit.

 * {X} Get a persistent `{node → rev}` mapping 

==== Branchmap ====

Computing the branch map from scratch can be very expensibe. Instead we use a cache. However this cache can en up being quite expensive too.

 * it be outdated or stalled 
 * making sure it is valid and its content is valid can be pretty expensive (see the NodeMap section right abose this one)
 * It growth to be as large and the number of head in the repository. And it grow even for closed branches.
 * It affect all repository even the one who do not use named branch (or very few of them).
 * each update requires the full file to be rewrittend.

So there are multiple things we could improve

 * removing the need for expensive validation by moving this in an `index` space (see ComputedIndexPlan)
 * making update of the data in memory cheaper by using topological head information
 * making the update of the file on disk incremental
 * make mmap an option to access the data (see MMapPlan).
 * Having a more compact representation (leveraging the topological head information).

==== Manifest Access ====

Many operation requires manifest access. This is important for this computation to be fast. Some work have been done in this area:

 * (./) `sparse-revlog` fixes pathological delta chain.
 * (./) an handful of recently used manifest are not cached
 * {i} tree manifest exist, but is still experimental

==== Copy Tracing ====

In some situation the using the copies information in a repository can be very slow. A new algorithm based on precomputed information is one the work.

=== Exchange ===

==== Discovery ====

The discovery is an important step of any push and pull operation. This is especially a probleme for repository with many heads (thousands). The known pathological case has been solved. However, we could reduce the base time further with more work.

 * (./) Algorithm reducing the number of round-trip,
 * (./) Rust implementation of the graph related computation,
 * {X} persistant nodemap cost to reduce server side query cost
 * {X} Using an abstraction to avoid exchanging the nodes of all heads.

==== Server-side Changegroup Performance ====
-Line 28:
+Line 86:
-The most effective way to alleviate this resource usage is by serving static, pre-generated changegroup data instead of dynamically generating it at request time. A server-side cache of changegroup data would fall into this bucket. The "clone bundles" feature which serves initial clones from URLs is one implementation of this. But it only addresses the initial clone case. Subsequent pulls still result in significant load on the server. There is support for a "remote changegroup" bundle2 part that allows servers to advertise the URL of a pre-generated changegroup. But there are no extensions or features relying on this.
+===== Caching bundles =====

The most effective way to alleviate this resource usage is by serving static, pre-generated changegroup data instead of dynamically generating it at request time. A server-side cache of changegroup data would fall into this bucket. The "clone bundles" feature which serves initial clones from URLs is one implementation of this. But it only addresses the initial clone case. Subsequent pulls still result in significant load on the server. There is support for a "remote changegroup" bundle2 part that allows servers to advertise the URL of a pre-generated changegroup. There is a prototype for an extension using this.

  * (./) Manually generated clone bundle
  * (./) Manually generated pull bundle
  * {i} Automatically generated clone/pull bundle (prototype exists)
  * {X} functionnaly pregenerated bundle for stream clone

==== Delta computation ====

  * (./) `sparse-revlog` improve the delta chain and their reusability
  * (./) reduced the number of files to investigate from merge 
  * {X} more delta could be reused avoiding computation both server and client side.

==== compression ====
-Line 32:
+Line 105:
+  * (./) zstd storage is now officially supported
  * {X} zstd is not the default.

==== Skipping the decompression/compression stage ====
-Line 33:
+Line 111:
+ * {X} support for transparent sending compressed deltas directly from disk.
-Line 38:
+Line 118:
- * {X} better control on compression,
-Line 40:
+Line 119:
- * {X} zstd for storage,
-Line 42:
+Line 120:
- * {X} Streaming clone
 * {X} clone bundle for pull
-Line 45:
+Line 121:
- * {X} improved discovery

Diff for "PerformancePlan"