caching pull - stable partitioning of bundle requests

Mon Apr 15 10:55:03 UTC 2019

On 3/3/19 10:01 PM, Pulkit Goyal wrote:
> 
> 
> On Tue, Feb 19, 2019 at 3:23 PM Boris FELD <boris.feld at octobus.net 
> <mailto:boris.feld at octobus.net>> wrote:
> 
>     On 15/02/2019 21:16, Pulkit Goyal wrote:
>>
>>
>>     On Thu, Oct 4, 2018 at 6:16 PM Boris FELD <lothiraldan at gmail.com
>>     <mailto:lothiraldan at gmail.com>> wrote:
>>
>>         The road for moving this in Core is clear, but not short. So
>>         far we have not been able to free the necessary time to do it.
>>         Between the paying-client work, we have to do to pay salaries
>>         (and keep users happy) and all the time we are already
>>         investing in the community, we are already fairly busy.
>>
>>
>>         In early 2017, Bitbucket gave $13, 500 to the Mercurial
>>         project to be spent to help evolution to move forward. As far
>>         as we know, this money is still unspent. Since stable range is
>>         a critical part of obsmarkers discovery, unlocking this money
>>         to be spent on upstreaming stable range would be a good idea
>>         (and fits its initial purposes). Paying for this kind of work
>>         will reduce the contention with client work and help us, or
>>         others, to dedicate time for it sooner than later.
>>
>>
>>     I definitely agree that obsmarker discovery is a critical part.
>>     Pulling from `hg-committed` is slower sometimes as compared to
>>     pulling on a repo (5-7x size of hg-committed) with server having
>>     thousands of heads.
>>
>>     Do you have any updates on the stable-range cache? In the current
>>     state the cache is pretty big and a lot of people faced problem
>>     with the cache size.Â  Also in case of strip or some other
>>     commands, it rebuilts the cache which takes more than 10 minutes
>>     on large repo which is definitely a bummer. Are you working on
>>     making that better and take less size? How is experimentation of
>>     evolve+stablerange going on?
> 
>     # Regarding the cache-size:
> 
>     We know that the current version caches many entries that are
>     trivial to compute and does not need to be cached. In addition, the
>     current storage (SQLite) does not seem very efficient.
> 
>     So the next iteration of the cache should be significantly smaller.
> 
> 
>     # Regarding cache invalidation:
> 
>     A lot of the data in the caches are an inherent property of the
>     changeset and therefore immutable. It's easy to preserve then during
>     strip to avoid having to recompute things from scratch. In addition,
>     these immutable data should be exchange during pull alongside the
>     associated changesets to avoid client recomputing the same data over
>     and over.
> 
>     The current implementation is an experimental/research
>     implementation, all this should get smoothed directly in Core during
>     the upstreaming.
> 
> 
> I am bit confused when you say "things should get smoothed directly in 
> Core during the upstreaming". Which one of the following did you mean:
> 
> 1) send a patch of the current implementation to core, once that patch 
> gets in, try to improve the implementation in core
> 2) send a series to core which contains patch of the current 
> implementation and other patches improving the implementation
> 
> 2) is same as what Augie did for narrow, linelog, remotefilelog, Greg 
> did for sparse, I did for infinitepush.
> 
> Which one do you mean here?

Mostly a variation of the second one. The code currently in evolve has a 
lot of extra unnecessary complexity that comes from the fact it happen 
from an extensions and that is was initially research code that evolve 
toward the current solution. Overall we can keep the main algorithm and 
reimplement more of what is around directly in core. In particular, most 
of the immutable property we compute could directly goes in new version 
of the revlog index to trivialize their storage.

For the record, joerg Sonnenberger looked at his during the mini-sprint 
and we discussed how it could be allied to exchange arbitrary notes for 
changesets (eg: new tags mechanism, code signing, CI status, etcâ€¦)

Cheers,

-- 
Pierre-Yves David