Proposal for detecting history rewriting on shared repos

Pierre-Yves David pierre-yves.david at ens-lyon.org
Thu Feb 13 12:37:48 CST 2014



On 02/12/2014 07:14 PM, Gregory Szorc wrote:
> On 2/12/14, 4:24 PM, Pierre-Yves David wrote:
>>
>>
>> On 02/12/2014 04:20 PM, Gregory Szorc wrote:
>>> The share extension and workflow is very fragile. If rewriting occurs on
>>> the original repository, there's a good chance shared clones of that
>>> repo will get corrupted. While there is a giant warning in the output of
>>> `hg help share` to warn you about this, Mercurial currently offers
>>> little to no assistance to detect and recover from this.
>>
>> […]
>>
>>> Thoughts?
>>
>> The branch cache have logic to detect non-append only operation on the
>> view. The same kind of logic should be applicable here.
>>
>> I know that the current cache key generated by branchcache have some
>> weakness for some corner case. If you hit them, feel free to improve it.
>
> I didn't realize that code existed!
>
> It sounds like you are proposing storing a hash of some set of revlog
> data (possibly the revs or nodes of the changelog) as the store ID. I
> think this could work. You're essentially proposing a direct test vs an
> indirect one. The indirect one, while faster, relies on code paths being
> complete or else we miss updates.

I also advertising for reusing//improvement of existing logic instead of 
creating a new mechanism for every usecase. I'm not certain it cover 
your use case but if worse having a look

> The current branch cache code is computing a hash over all filtered
> revs. I /think/ that because the branch cache doesn't care about
> filtered revs that it can get away with computing just the revs and not
> nodes?

Not sure I understand. We have multiple branchcache for different level 
of filtering. So the cache key have to include information about what 
rev was excluded int he computation. This should not be relevant for 
your case.

> For the share case, I /think/ we would need to hash nodes so history
> rewriting that doesn't change rev count won't fall through a crack.

The branches map cache is reasoning on nodes only (mostly) and is not 
affected by rev ordering (very same content, but different order of the 
node). Are the sahre extension affected by it? (I believe the question 
can be reduced to "does the working copy data store anything related to 
revs?")

> If a repo has changed since last open, we'll need to scan most of at
> least the changelog index to get all the nodes. That's a few MB of I/O
> on open assuming a repo with over 100k commits (my Firefox repo has a 12
> MB 00changelog.i). It should hopefully be in the page cache. But still -
> that could add up. Is this acceptable? Paging the changelog index
> doesn't happen normally, does it?

No. The point of the branchmap key is to be able to know if the last 
stored key still apply to the current repo without too much 
recomputation. The key is then updated with just the new data added 
since last key build.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list