Proposal for detecting history rewriting on shared repos
pierre-yves.david at ens-lyon.org
Thu Feb 13 12:37:48 CST 2014
On 02/12/2014 07:14 PM, Gregory Szorc wrote:
> On 2/12/14, 4:24 PM, Pierre-Yves David wrote:
>> On 02/12/2014 04:20 PM, Gregory Szorc wrote:
>>> The share extension and workflow is very fragile. If rewriting occurs on
>>> the original repository, there's a good chance shared clones of that
>>> repo will get corrupted. While there is a giant warning in the output of
>>> `hg help share` to warn you about this, Mercurial currently offers
>>> little to no assistance to detect and recover from this.
>> The branch cache have logic to detect non-append only operation on the
>> view. The same kind of logic should be applicable here.
>> I know that the current cache key generated by branchcache have some
>> weakness for some corner case. If you hit them, feel free to improve it.
> I didn't realize that code existed!
> It sounds like you are proposing storing a hash of some set of revlog
> data (possibly the revs or nodes of the changelog) as the store ID. I
> think this could work. You're essentially proposing a direct test vs an
> indirect one. The indirect one, while faster, relies on code paths being
> complete or else we miss updates.
I also advertising for reusing//improvement of existing logic instead of
creating a new mechanism for every usecase. I'm not certain it cover
your use case but if worse having a look
> The current branch cache code is computing a hash over all filtered
> revs. I /think/ that because the branch cache doesn't care about
> filtered revs that it can get away with computing just the revs and not
Not sure I understand. We have multiple branchcache for different level
of filtering. So the cache key have to include information about what
rev was excluded int he computation. This should not be relevant for
> For the share case, I /think/ we would need to hash nodes so history
> rewriting that doesn't change rev count won't fall through a crack.
The branches map cache is reasoning on nodes only (mostly) and is not
affected by rev ordering (very same content, but different order of the
node). Are the sahre extension affected by it? (I believe the question
can be reduced to "does the working copy data store anything related to
> If a repo has changed since last open, we'll need to scan most of at
> least the changelog index to get all the nodes. That's a few MB of I/O
> on open assuming a repo with over 100k commits (my Firefox repo has a 12
> MB 00changelog.i). It should hopefully be in the page cache. But still -
> that could add up. Is this acceptable? Paging the changelog index
> doesn't happen normally, does it?
No. The point of the branchmap key is to be able to know if the last
stored key still apply to the current repo without too much
recomputation. The key is then updated with just the new data added
since last key build.
More information about the Mercurial-devel