[PATCH 2 of 2] strip: make tree stripping O(changes) instead of O(repo)
Martin von Zweigbergk
martinvonz at google.com
Mon May 15 17:19:03 EDT 2017
On Mon, May 15, 2017 at 1:39 PM, Durham Goode <durham at fb.com> wrote:
>
>
> On 5/15/17 1:32 PM, Martin von Zweigbergk wrote:
>>
>> On Mon, May 8, 2017 at 11:40 AM, Durham Goode <durham at fb.com> wrote:
>>>
>>> # HG changeset patch
>>> # User Durham Goode <durham at fb.com>
>>> # Date 1494268523 25200
>>> # Mon May 08 11:35:23 2017 -0700
>>> # Node ID 74881b9a39b2bab273d09009385e3c9ca717a13a
>>> # Parent 5dec5907fe49a488d3ade272d4a5cf090914e59c
>>> strip: make tree stripping O(changes) instead of O(repo)
>>>
>>> The old tree stripping logic iterated over every tree revlog in the repo
>>> looking
>>> for commits that had revs to be stripped. That's very inefficient in
>>> large
>>> repos. Instead, let's look at what files are touched by the strip and
>>> only
>>> inspect those revlogs.
>>
>>
>> Sorry, I didn't notice this patch until today (because bisection of a
>> test failure in narrowhg pointed me to it). This patch makes me a
>> little worried that it'll have the same bugs as the initial version of
>> changegroup generation for treemanifests had, i.e. that it forgets
>> that merge commits can affect no files, but still affect directories
>> (fixed in commit 1ac8ce13). I haven't tried to see if I can make it
>> fail with this patch applied (the narrowhg failures was unrelated).
>
>
> Ug, good point. I think this patch will fail in that case. Iterating over
> all the revlogs really isn't a scalable option though, so it sounds like
> maybe we need to actually do a walk of the trees for manifests that are
> being stripped. Like, for each tree being stripped, diff them with their
> parents and return all directories that are new.
Yep, that's exactly what I did for changegroup generation in 1ac8ce13.
For now, could you send a patch to back this change out and I'll queue it?
>
> We could probably add an argument to walksubtrees that only yields subtrees
> that are different from the argument tree. This type of logic is useful for
> determining what trees need to be bundled too. We do something like this in
> our native implementation.
>
>
>>>
>>> diff --git a/mercurial/repair.py b/mercurial/repair.py
>>> --- a/mercurial/repair.py
>>> +++ b/mercurial/repair.py
>>> @@ -238,11 +238,12 @@ def strip(ui, repo, nodelist, backup=Tru
>>> def striptrees(repo, tr, striprev, files):
>>> if 'treemanifest' in repo.requirements: # safe but unnecessary
>>> # otherwise
>>> - for unencoded, encoded, size in repo.store.datafiles():
>>> - if (unencoded.startswith('meta/') and
>>> - unencoded.endswith('00manifest.i')):
>>> - dir = unencoded[5:-12]
>>> - repo.manifestlog._revlog.dirlog(dir).strip(striprev, tr)
>>> + treerevlog = repo.manifestlog._revlog
>>> + for dir in util.dirs(files):
>>
>>
>> Note that util.dirs() does not return the root directory (I've often
>> wanted to change that, and I may send a patch soon), so do you need to
>> manually include it here?
>
>
> The root manifest is already handled by the normal strip mechanism, so no
> need to handle it here.
More information about the Mercurial-devel
mailing list