caching pull - stable partitioning of bundle requests

Boris FELD lothiraldan at gmail.com
Thu Oct 4 15:15:01 UTC 2018


On 27/09/2018 18:15, Gregory Szorc wrote:
> On Wed, Sep 26, 2018 at 11:13 AM Boris FELD <boris.feld at octobus.net
> <mailto:boris.feld at octobus.net>> wrote:
>
>     Hi everyone,
>
>     Pulling from a server involves expensive server-side computation
>     that we wish to cache. However, since the client can pull any
>     arbitrary set of revision, grouping and dispatching the data to be
>     cached is ahard problem.
>
>     When we implemented the new discovery for obsolescence markers, we
>     developed a "stablerange" method to build an efficient way to
>     slice the changesets graph into ranges. In addition to solving the
>     obsolescence markers discovery problem, this "stablerange"
>     principle seemed to be useful for more usages, in particular,the
>     caching of pulls.
>
>     Right now, with the current pull bundle implementation, here is
>     how it work: you manually create and manually declare bundles
>     containing either all changesets (that could also be used for
>     clone bundles) or more specific ones. When the client request some
>     changesets, the server searches a bundle containing the needed
>     range and send it. This often involves more than the requested
>     data. The client needs to filter out the extraneous data. Then the
>     client does a discovery to catch any missing changesets from the
>     bundle. If the server doesn't find a valid pull bundle, a normal
>     discovery is done.The manual bundle managements is suboptimal, the
>     search for appropriate bundles has a bad complexity and the extra
>     roundtrip and discovery adds extra slowness.
>
>     This weekend, we build a "simple" prototype that use "stablerange"
>     to slice changegroup request in "getbundle" into multiple
>     bundlesthat can be reused from one pull to another. That slicing
>     happens as part of a normal pull, during the getbundle call and
>     after the normal discovery happens. There are no needs for an
>     extra discovery and getbundle call after it.
>
>     With this"stablerange"based strategy,we start from the set of
>     requested changesets to generate a set of "standard" range
>     covering all of them. This slicing has a good algorithmic
>     complexity that depends on the size of the selected "missing" set
>     of changesets. So the associated cost of will scale well with the
>     size of the associated pull. In addition, we no longer have to do
>     an expensive search into a list existing bundles. This helps to
>     scale small pulls and increase the number of bundles we can cache,
>     as the time we spend selecting bundle no longer depends on the
>     numbers of cached ones. Since we can exactly cover the client
>     request, we also no longer need to issue anextra pull roundtrip
>     after the cache retrieval.
>
>     That slicing focus on producing ranges that:
>
>       * Have a high chance to be reusable in a pull selecting similar
>         changesets,
>
>       * Gather most of the changesets in large bundles.
>
>
>     This caching strategy inherits the nice "stablerange" properties
>     regarding repository growth
>
>       * When a few changesets are appended to a repository, only a few
>         ranges haveto be added.
>
>       * The overall number of ranges (and associated bundles) to
>         create to represent all possible ranges has anO(N log(N))
>         complexity.
>
>
>     For example, here are the 15 ranges selected for a full clone of
>     mozilla-central:
>
>       * |262114 changesets|
>
>       * |30 changesets|
>
>       * |65536 changesets|
>
>       * |32741 changesets|
>
>       * |20 changesets|
>
>       * |7 changesets|
>
>       * |8192 changesets|
>
>       * |243 changesets|
>
>       * |13 changesets|
>
>       * |114 changesets|
>
>       * |14 changesets|
>
>       * |32 changesets|
>
>       * |16 changesets|
>
>       * |8 changesets|
>
>       * |1 changesets|
>
>      *
>
>
>     If we only clone a subset of the repository, the larger ranges get
>     reused (hg clone --rev -5000):
>
>       * |262114 changesets found in caches|
>
>       * |30 changesets found in caches|
>
>       * |65536 changesets found in caches|
>
>       * |32741 changesets found in caches|
>
>       * |20 changesets found in caches|
>
>       * |7 changesets found in caches|
>
>       * |2048 changesets found|
>
>       * |1024 changesets found|
>
>       * |482 changesets found|
>
>       * |30 changesets found|
>
>       * |32 changesets found|
>
>       * |1 changesets found|
>
>       * |7 changesets found|
>
>       * |4 changesets found|
>
>       * |2 changesets found|
>
>       * |1 changesets found|
>
>     As you can see, the larger ranges of this second pull are common
>     with the previous pull, allowing to reuse cached bundles.
>
>     The prototype is available in a small "pullbundle" extension. It
>     focuses on the slicing itself and we did not implement anything
>     fancy for the cache storage and delivery. We simply store
>     generated bundle on disk and we read it from disk when it is
>     needed again. Others, like Joerg Sonnenberger or Gregory Szorc,
>     are already working on the "cache delivery" problem.
>
>     We are getting good result our of that prototypes when testing it
>     on clones of mozilla-central and netbsd-src. See "Example Result"
>     section for detail.
>
>     The prototype is up and running on our hgweb "mirror" instance
>     https://mirror.octobus.net/.
>
>     The extension comes with a small debug command that produces
>     statistic of the ranges that multiple random pulls would use.
>
>     The "stablerange" implementation currently still[1] live in the
>     evolve extensions, so we put the extensions in the same repository
>     for simplicity as "pullbundle". This is not ideal but was a simple
>     solution in the time we could dedicate. This extension is not part
>     of any official release yet. To test it you have to install it
>     from the repository for now:
>     https://www.mercurial-scm.org/repo/evolve/#default
>
>     The extension's code is here:
>     https://www.mercurial-scm.org/repo/evolve/file/tip/hgext3rd/pullbundle.py
>
>     The prototype performance is not stellar, but good enough to give
>     useful result in a reasonable amount of time. A production-grade
>     implementation of stablerange algorithm and storage will fix that.
>     There is also room for improvement progression in the algorithm
>     themselves, multiple sub-problem can be improved. We started
>     having regular meeting with University researcher working on graph
>     theory, they are interested in the problem space.
>
>     [1] The stablerange principle has been validating in the field and
>     is ready to get upstreamed.
>
>
> This experimentation is very useful and shows great promise!
>
> Intelligent "slicing" of data in order to facilitate high cache hit
> rates and lower server work and client overhead is definitely worth
> pursuing.
>
> FWIW I submitted my caching series for wire protocol version 2 a few
> hours ago. The meaningful patches are at
> https://phab.mercurial-scm.org/D4773 and
> https://phab.mercurial-scm.org/D4774. This work focuses on transparent
> whole-response caching, however, and does not (yet) implement a
> caching layer for e.g. storage access. I will note that I *really*
> want to enable Mercurial servers to offload as much data transfer to
> static file servers as possible.
>
> An open item to solve is how we're going to facilitate bulk data
> transfer using fewer roundtrips in wire protocol version 2. Right now,
> the client has to send a *lot* of commands and data to the server (all
> the manifest and file nodes). We know we must do something better. I
> think this slicing idea could definitely be employed to facilitate
> higher cache hit rates. How, exactly, I'm not sure.
>
> Something else to consider is that right now, an `hg clone` will
> preserve changelog order from server to client. This is trivial to
> change. And it is arguably an implementation detail of revlogs, which
> preserve insertion order. (And until
> https://www.mercurial-scm.org/repo/hg-committed/rev/db5501d93bcf it
> was the behavior for manifests and filelogs as well.) I'm not sure how
> important it is to preserve this property. But if we start slicing
> changesets, we could easily break this behavior.
>
> This email caused me to consider the possibility that the "content
> redirects" feature proposed in D4774 should (eventually) support
> multiple locations. Then, a server could slice a large response into
> multiple payloads and have the client retrieve them separately.
>
> Alternatively, since wire protocol version 2 currently performs a lot
> of granular requests, perhaps the server could inform the client of
> preferred slicing for data fetching and the client could follow up
> accordingly. How this would interact with bulk data transfer commands,
> I'm not sure.
>
> There's definitely a lot to think about and I hope this slicing
> experimentation finds a home in core someday!
This slicing is based on the "stablerange" algorithm. The same as the
one used to bisect obsmarkers during obsmarkers discovery.

The implementation of "stablerange" currently in use (in evolve) is a
"research implementation". It is the result of the initial searching for
a solution to stable slicing with good properties. Work on it was not
pushed further than proving the approach had such properties. It turns
out the current implementation is "good enough" to provide usable
solutions in some area where the alternatives ones were severely lacking
(eg: pullbundle slicing, obsmarkers discovery, ...). However, that
implementation needs to be made "production-grade" before we can
activate it by default. The goal is to be as fast as today for most
commands (commit for example) and faster for exchange
(obs-markers/pullbundle). In particular, cache size and overall
performance can be improved.

So, overall we have a good solution that should move upstream. The
overall algorithm is clear so it is mostly "straightforward" engineering
work to perform. That said, the amount of engineering work is
significant. There are three high-level items to be completed:

    - the basic caches should get a unified and simpler implementation
(probably integrated to changelog-v3 for simplicity),

    - the stable range cache itself: will get a better storage and will
also store less data,

    - If needed, a compiled implementation will boost performance to the
point we don't have to worries about them. (maybe in Rust?)


The stable range computation uses a couple of data inherent to each node
that we have to cache. The two simplest one could just go into the
changelog index:

    - the depth of a change, ie: `len(::X)`.

    - first merge ancestors, ie: `max(merge() and ::X)`

Stable range also relies on a stable sort algorithm, which requires the
caching of some data for each merges nodes. This could also be stored in
the changelog (this time, the data parts).

Then, comes the actual stable range algorithm and related cache. It
computes and stores details of how a given range get sliced into small
ranges. Currently, we use sqlite to store this cache. This was a quick
way to get storage running while experimenting. However, now, sqlite
account for a large share of the current performance cost and size. So
we should move away from it in favor of a simpler on-disk format.
In addition, we don't need to cache as much data. In most cases, slicing
a range is trivial and won't require any complex computation. We should
compute these on the fly instead of writing them to the cache. This will
shrink the cache size significantly (in addition to gains from moving
way from sqlite).

Finally, all this computation mostly deal with simple integers, tuples,
and lists. An area where Python is notably sub-optimal. Writing this in
a low-level language and giving it access to the low-level
implementation of the changelog's index will provide a massive
performance boost. The overall computational complexity of the
stable-range algorithms is good, so given our previous experience with
writing compiled version of such algorithm, we can expect performance
concerns to goes away once we have it.


The road for moving this in Core is clear, but not short. So far we have
not been able to free the necessary time to do it. Between the
paying-client work, we have to do to pay salaries (and keep users happy)
and all the time we are already investing in the community, we are
already fairly busy.

In early 2017, Bitbucket gave $13, 500 to the Mercurial project to be
spent to help evolution to move forward. As far as we know, this money
is still unspent. Since stable range is a critical part of obsmarkers
discovery, unlocking this money to be spent on upstreaming stable range
would be a good idea (and fits its initial purposes). Paying for this
kind of work will reduce the contention with client work and help us, or
others, to dedicate time for it sooner than later.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20181004/5a8d0dc9/attachment.html>


More information about the Mercurial-devel mailing list