Differences between revisions 1 and 2
Revision 1 as of 2016-12-02 15:02:42
Size: 972
Comment: Creating a basic page, Greg told me he might fill it if I do so.
Revision 2 as of 2016-12-04 19:20:30
Size: 2563
Editor: GregorySzorc
Comment: notes on server-side changegroup performance
Deletions are marked like this. Additions are marked like this.
Line 24: Line 24:
=== Server-side Changegroup Performance ===

Servers tend to spend a lot of CPU and bandwidth computing and transferring changegroup data.

The most effective way to alleviate this resource usage is by serving static, pre-generated changegroup data instead of dynamically generating it at request time. A server-side cache of changegroup data would fall into this bucket. The "clone bundles" feature which serves initial clones from URLs is one implementation of this. But it only addresses the initial clone case. Subsequent pulls still result in significant load on the server. There is support for a "remote changegroup" bundle2 part that allows servers to advertise the URL of a pre-generated changegroup. But there are no extensions or features relying on this.

There is plenty of potential to optimize the server for changegroup generation. As of Mercurial 4.0, changegroups (with exception of the changelog) are effectively collections of single delta chains per revlogs. For generaldelta repos, many deltas on disk are reused. However, the server still needs to decompress the revlog entries on disk to obtain the raw deltas then recompress them as part of the changegroup compression context. Furthermore, if there are multiple delta chains in the revlog, the server will need to compute a new delta for those entries. This contributes to overhead, especially the decompression and recompression. Switching away from zlib for both revlog storage and wire protocol compression will help tremendously, as zstd can be 2x more efficient in both decompression and compression.

Note:

This page is primarily intended for developers of Mercurial.

Performance Improvement Plan

Status: In progress

Main proponents: Pierre-YvesDavid, GregorySzorc

/!\ This is a speculative project and does not represent any firm decisions on future behavior.

The goal of this page is to gather data about known performance bottleneck and ideas about how to solve them

1. Goal

(I'm creating this page with clone/push/pull in mind) XXX fill me more

2. Detailed description

All kind of stuff can go here, solution description / alternative solution etc

2.1. Server-side Changegroup Performance

Servers tend to spend a lot of CPU and bandwidth computing and transferring changegroup data.

The most effective way to alleviate this resource usage is by serving static, pre-generated changegroup data instead of dynamically generating it at request time. A server-side cache of changegroup data would fall into this bucket. The "clone bundles" feature which serves initial clones from URLs is one implementation of this. But it only addresses the initial clone case. Subsequent pulls still result in significant load on the server. There is support for a "remote changegroup" bundle2 part that allows servers to advertise the URL of a pre-generated changegroup. But there are no extensions or features relying on this.

There is plenty of potential to optimize the server for changegroup generation. As of Mercurial 4.0, changegroups (with exception of the changelog) are effectively collections of single delta chains per revlogs. For generaldelta repos, many deltas on disk are reused. However, the server still needs to decompress the revlog entries on disk to obtain the raw deltas then recompress them as part of the changegroup compression context. Furthermore, if there are multiple delta chains in the revlog, the server will need to compute a new delta for those entries. This contributes to overhead, especially the decompression and recompression. Switching away from zlib for both revlog storage and wire protocol compression will help tremendously, as zstd can be 2x more efficient in both decompression and compression.

3. Roadmap

(various thing I heard about please

  • {X} better control on compression,

  • {X} skipping useless buffering,

  • {X} zstd for storage,

  • {X} zstd for diff,

  • {X} Streaming clone

  • {X} clone bundle for pull

  • {X} alternative backend

  • {X} improved discovery

4. See Also


CategoryDeveloper CategoryNewFeatures

PerformancePlan (last edited 2020-07-21 01:59:37 by JoergSonnenberger)