[PATCH 1 of 6 V2] obsstore: add a 'cachekey' method

Tue May 23 05:47:33 EDT 2017

On 05/23/2017 04:22 AM, Jun Wu wrote:
> I have got a chance to test this in production with one of our biggest repo
> and a 18MB obsstore. I tried running "hg id" with a a minimal hgrc (with
> just lz4revlog and remotefilelog enabled) and without an existing obscache.
>
> dualsourcecache.update takes 9 seconds. 7 seconds are used to scan
> changelog, 0.36 seconds used to scan obsstore.
>
> The related profiling data is as below:
>
>   9735       | _computeobsoleteset              obsolete.py:1554
>   9421        \ update                          obsolete.py:1227
>   1872          \ _upgradeneeded                obsolete.py:1311
>    434            \ revs (many times)           changelog.py:319
>   1179            \ __iter__                    obsolete.py:564
>   1179             | __get__                    util.py:798
>   1177             | _all                       obsolete.py:707
>   1138             | _readmarkers               obsolete.py:444
>   1138             | _fm1readmarkers            obsolete.py:432
>   7549          \ _updatefrom                   obsolete.py:1433
> * 7185            \ _updaterevs                 obsolete.py:1439
>    766              \ __get__                   util.py:798
>    766               | successors               obsolete.py:717
>    766               | _addsuccessors           obsolete.py:506
> * 3964              \ node (many times)         changelog.py:359
>    364            \ _updatemarkers              obsolete.py:1471
>
> So it is expensive for us to rebuild the cache. It seems the cache may be
> dropped in some cases (ex. histedit --abort). I wonder if we can make the
> cache more persistent so it'll be impossible to be nuked entirely, like when
> doing a strip, just truncate the cache file to the strip point instead of
> nuking it.

The simplest solution I can see here is to backup the cache when 
histedit/rebase starts and reinstall the cache when the operation is 
aborted. That way we can incrementally upgrade from that backup point. 
This approach can be generalized to other caches suffering from strip.

Stripping the cache is also an option, but only in the case where only 
the changelog has been stripped (not the obsstore).

(note: there are likely simple optimization lurking in the update 
function too. For example is len(obsstore) << len(repo), we could focus 
on iterating on the obsstore only when rebuilding the cache, etc…).

Does these details address your concerns?

Cheers,

-- 
Pierre-Yves David