caching pull - stable partitioning of bundle requests

Tue Feb 19 04:53:34 EST 2019

On 15/02/2019 21:16, Pulkit Goyal wrote:
>
>
> On Thu, Oct 4, 2018 at 6:16 PM Boris FELD <lothiraldan at gmail.com
> <mailto:lothiraldan at gmail.com>> wrote:
>
>     The road for moving this in Core is clear, but not short. So far
>     we have not been able to free the necessary time to do it. Between
>     the paying-client work, we have to do to pay salaries (and keep
>     users happy) and all the time we are already investing in the
>     community, we are already fairly busy.
>
>
>     In early 2017, Bitbucket gave $13, 500 to the Mercurial project to
>     be spent to help evolution to move forward. As far as we know,
>     this money is still unspent. Since stable range is a critical part
>     of obsmarkers discovery, unlocking this money to be spent on
>     upstreaming stable range would be a good idea (and fits its
>     initial purposes). Paying for this kind of work will reduce the
>     contention with client work and help us, or others, to dedicate
>     time for it sooner than later.
>
>
> I definitely agree that obsmarker discovery is a critical part.
> Pulling from `hg-committed` is slower sometimes as compared to pulling
> on a repo (5-7x size of hg-committed) with server having thousands of
> heads.
>
> Do you have any updates on the stable-range cache? In the current
> state the cache is pretty big and a lot of people faced problem with
> the cache size.  Also in case of strip or some other commands, it
> rebuilts the cache which takes more than 10 minutes on large repo
> which is definitely a bummer. Are you working on making that better
> and take less size? How is experimentation of evolve+stablerange going on?

# Regarding the cache-size:

We know that the current version caches many entries that are trivial to
compute and does not need to be cached. In addition, the current storage
(SQLite) does not seem very efficient.

So the next iteration of the cache should be significantly smaller.

# Regarding cache invalidation:

A lot of the data in the caches are an inherent property of the
changeset and therefore immutable. It's easy to preserve then during
strip to avoid having to recompute things from scratch. In addition,
these immutable data should be exchange during pull alongside the
associated changesets to avoid client recomputing the same data over and
over.

The current implementation is an experimental/research implementation,
all this should get smoothed directly in Core during the upstreaming.

# Regarding  cache-computation speed:

The current implementation is a "research" version written in Python, it
is not geared toward efficiency and contains a lot of indirections that
were helpful to reach the current solution but are now getting in the
way of performance.

The initial implementation (in Evolve), focused on finding a solution
with good scaling property (good computational and space complexity).
However, we did not spend too much time improving the "constant" factor.
Now that we know where we are headed we can have a much better
implementation.

Once we have better on disk storage, native code and client/server
exchange of most of the data, the impact of stable-range should get to a
negligible level.

# Regarding what's next:

The experimental implementation cleared the unknown around stable-range
computation and caching. However, even if the road is clear, a sizable
amount of work remains, especially to move away from the unsuitable
SQLite storage. We think that putting the Bitbucket donation to use is
the best way to make sure this work gets done soon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20190219/854e9b8c/attachment.html>