[PATCH 2 of 2] strip: make tree stripping O(changes) instead of O(repo)

Martin von Zweigbergk martinvonz at google.com
Mon May 15 17:19:03 EDT 2017


On Mon, May 15, 2017 at 1:39 PM, Durham Goode <durham at fb.com> wrote:
>
>
> On 5/15/17 1:32 PM, Martin von Zweigbergk wrote:
>>
>> On Mon, May 8, 2017 at 11:40 AM, Durham Goode <durham at fb.com> wrote:
>>>
>>> # HG changeset patch
>>> # User Durham Goode <durham at fb.com>
>>> # Date 1494268523 25200
>>> #      Mon May 08 11:35:23 2017 -0700
>>> # Node ID 74881b9a39b2bab273d09009385e3c9ca717a13a
>>> # Parent  5dec5907fe49a488d3ade272d4a5cf090914e59c
>>> strip: make tree stripping O(changes) instead of O(repo)
>>>
>>> The old tree stripping logic iterated over every tree revlog in the repo
>>> looking
>>> for commits that had revs to be stripped. That's very inefficient in
>>> large
>>> repos. Instead, let's look at what files are touched by the strip and
>>> only
>>> inspect those revlogs.
>>
>>
>> Sorry, I didn't notice this patch until today (because bisection of a
>> test failure in narrowhg pointed me to it). This patch makes me a
>> little worried that it'll have the same bugs as the initial version of
>> changegroup generation for treemanifests had, i.e. that it forgets
>> that merge commits can affect no files, but still affect directories
>> (fixed in commit 1ac8ce13). I haven't tried to see if I can make it
>> fail with this patch applied (the narrowhg failures was unrelated).
>
>
> Ug, good point. I think this patch will fail in that case. Iterating over
> all the revlogs really isn't a scalable option though, so it sounds like
> maybe we need to actually do a walk of the trees for manifests that are
> being stripped.  Like, for each tree being stripped, diff them with their
> parents and return all directories that are new.

Yep, that's exactly what I did for changegroup generation in 1ac8ce13.

For now, could you send a patch to back this change out and I'll queue it?

>
> We could probably add an argument to walksubtrees that only yields subtrees
> that are different from the argument tree. This type of logic is useful for
> determining what trees need to be bundled too. We do something like this in
> our native implementation.
>
>
>>>
>>> diff --git a/mercurial/repair.py b/mercurial/repair.py
>>> --- a/mercurial/repair.py
>>> +++ b/mercurial/repair.py
>>> @@ -238,11 +238,12 @@ def strip(ui, repo, nodelist, backup=Tru
>>>  def striptrees(repo, tr, striprev, files):
>>>      if 'treemanifest' in repo.requirements: # safe but unnecessary
>>>                                              # otherwise
>>> -        for unencoded, encoded, size in repo.store.datafiles():
>>> -            if (unencoded.startswith('meta/') and
>>> -                unencoded.endswith('00manifest.i')):
>>> -                dir = unencoded[5:-12]
>>> -                repo.manifestlog._revlog.dirlog(dir).strip(striprev, tr)
>>> +        treerevlog = repo.manifestlog._revlog
>>> +        for dir in util.dirs(files):
>>
>>
>> Note that util.dirs() does not return the root directory (I've often
>> wanted to change that, and I may send a patch soon), so do you need to
>> manually include it here?
>
>
> The root manifest is already handled by the normal strip mechanism, so no
> need to handle it here.


More information about the Mercurial-devel mailing list