[PATCH 2 of 2 V3] treemanifest: make node reuse match flat manifest behavior

Martin von Zweigbergk martinvonz at google.com
Fri Mar 3 13:35:30 EST 2017


For now, I'm just adding back mercurial-devel that I accidentally
dropped in my reply (sorry, the default is different between Gmail and
Inbox, and I use both).

On Fri, Mar 3, 2017 at 10:05 AM, Durham Goode <durham at fb.com> wrote:
>
>
> On 3/3/17 9:25 AM, Martin von Zweigbergk wrote:
>>
>> On Wed, Mar 1, 2017 at 4:37 PM, Durham Goode <durham at fb.com> wrote:
>>>
>>> # HG changeset patch
>>> # User Durham Goode <durham at fb.com>
>>> # Date 1488413981 28800
>>> #      Wed Mar 01 16:19:41 2017 -0800
>>> # Node ID f99c019063bea8a0b5f66c1e79dbd0b98b1326d8
>>> # Parent  9c89185b63f574cad1d4009f6c9dd511a2cba986
>>> treemanifest: make node reuse match flat manifest behavior
>>>
>>> In a flat manifest, a node with the same content but different parents is
>>> still
>>> considered a new node. In the current tree manifests however, if the
>>> content is
>>> the same, we ignore the parents entirely and just reuse the existing
>>> node.
>>
>>
>> I think I copied that behavior from filelogs.
>
>
> filelog.add() calls straight to revlog.addrevision() which only exits early
> if the full p1+p2+data hash matches an existing full hash. In fact, it
> couldn't exit early for a content-only match because it would have to
> decompress the p1 content to even check. The only reason treemanifest could
> do it is because it happens to always have the p1 in memory and available
> for comparison.
>
>>>
>>> In our external treemanifest extension, we want to allow having one
>>> treemanifest
>>> for every flat manifests,
>>
>>
>> Nit: did you mean it the other way around? Because it sounds like you
>> can already have one treemanifest per flat manifest, but not one flat
>> manifest per treemanifest. But the next sentence says you want them
>> 1:1, so maybe add "and vice versa" to be clear, or "one unique
>> treemanifest per flat manifest".
>
>
> K, I can add the word unique to it.  When I said 'one x for every y' I meant
> len(x) == len(y), but maybe it's not so clear.
>
>>>  as a way of easeing the migration to treemanifests. To
>>> make this possible, let's change the root node treemanifest behavior to
>>> match
>>> the behavior for flat manifests, so we can have a 1:1 relationship.
>>
>>
>> Could you elaborate on what complications the current implementation
>> causes for you? It may not be necessary to put it in an updated commit
>> message, but I'd like to understand the reason better.
>
>
> As part of our migration to treemanifest, I'm backfilling all the trees for
> our entire repo, and I'm giving the root node the original flat-manifest
> hash, so eventually we can even delete the flat manifest and our old commits
> will still work. To make this possible, I need to have 1 unique treemanifest
> for every flat manifest, so all the old manifest hashes can still be
> referenced.
>
>> It's a little strange to me to use a different condition for the root
>> manifest. If we think that we should always emit a new revision if
>> either content or parents have changed, then maybe we should do that
>> for all directories. If we don't, it seems like we should not do it
>> for the root manifest either. In that case, I'd be happier to extract
>> a function or two that makes it easier for you guys to override it
>> internally instead.
>
>
> One difference between the root manifest and the sub-trees is that
> manifest.py does not control the p1/p2 for the root, but it does control the
> p1/p2 for the sub-trees.  If manifest.py controlled p1/p2 for the root, we
> could be smart and not produce a new tree when the content matches p1 or p2.
> But because an external caller provides p1/p2, if manifest.add(p1, p2,
> content) returns a node that doesn't actually have p1 and p2 as the parents
> (which is what can happen today), then we're not actually respecting what
> the caller asked us to do. That's why we need to respect it for the root,
> but not necessarily for the sub-trees.
>
> Usually Mercurial is good about not calling manifest.add(p1, p2, content)
> when the content matches p1, which is why we don't see a ton of empty
> manifest deltas in the repo. This does become an issue for merge commits
> though, where the code doesn't just want to throw out the p2 information so
> it calls mf.add(p1, p2, same_content), which is how we found the issue in
> the first place.
>
> Technically we could apply this behavior to every sub-tree as well, but that
> introduces complications. For instance, I think it'd mean we have to create
> a new empty sub-tree for every commit, since for every sub-tree the content
> is the same but the parents are not.
>
>>>
>>> While this sounds like a BC breakage, it's not actually a state users can
>>> normally get in because: A) you can't make empty commits, and B) even if
>>> you try
>>> to make an empty commit (by making a commit then amending it's changes
>>> away),
>>> the higher level commit logic in localrepo.commitctx() forces the commit
>>> to use
>>> the original p1 manifest node if no files were changed. So this would
>>> only
>>> affect extensions and automation that reached passed the normal
>>> localrepo.commit() logic straight into the manifest logic.
>>
>>
>> So what does the included test case do? There is no extension involved
>> there.
>
>
> It just verifies the existing behavior. I wrote it hoping it'd expose the
> bug, but it just worked, which led me notice the localrepo.commitctx logic.
>
>
>>
>>>
>>> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
>>> --- a/mercurial/manifest.py
>>> +++ b/mercurial/manifest.py
>>> @@ -1233,7 +1233,7 @@ class manifestrevlog(revlog.revlog):
>>>      def _addtree(self, m, transaction, link, m1, m2, readtree):
>>>          # If the manifest is unchanged compared to one parent,
>>>          # don't write a new revision
>>> -        if m.unmodifiedsince(m1) or m.unmodifiedsince(m2):
>>> +        if self._dir != '' and (m.unmodifiedsince(m1) or
>>> m.unmodifiedsince(m2)):
>>>              return m.node()
>>>          def writesubtree(subm, subp1, subp2):
>>>              sublog = self.dirlog(subm.dir())
>>> @@ -1241,13 +1241,17 @@ class manifestrevlog(revlog.revlog):
>>>                         readtree=readtree)
>>>          m.writesubtrees(m1, m2, writesubtree)
>>>          text = m.dirtext(self._usemanifestv2)
>>> -        # Double-check whether contents are unchanged to one parent
>>> -        if text == m1.dirtext(self._usemanifestv2):
>>> -            n = m1.node()
>>> -        elif text == m2.dirtext(self._usemanifestv2):
>>> -            n = m2.node()
>>> -        else:
>>> +        n = None
>>> +        if self._dir != '':
>>> +            # Double-check whether contents are unchanged to one parent
>>> +            if text == m1.dirtext(self._usemanifestv2):
>>> +                n = m1.node()
>>> +            elif text == m2.dirtext(self._usemanifestv2):
>>> +                n = m2.node()
>>> +
>>> +        if not n:
>>>              n = self.addrevision(text, transaction, link, m1.node(),
>>> m2.node())
>>> +
>>>          # Save nodeid so parent manifest can calculate its nodeid
>>>          m.setnode(n)
>>>          return n
>>> diff --git a/tests/test-treemanifest.t b/tests/test-treemanifest.t
>>> --- a/tests/test-treemanifest.t
>>> +++ b/tests/test-treemanifest.t
>>> @@ -825,3 +825,13 @@ other branch
>>>    added 1 changesets with 1 changes to 1 files (+1 heads)
>>>    (run 'hg heads' to see heads, 'hg merge' to merge)
>>>
>>> +Committing a empty commit does not duplicate root treemanifest
>>> +  $ echo z >> z
>>> +  $ hg commit -Aqm 'pre-empty commit'
>>> +  $ hg rm z
>>> +  $ hg commit --amend -m 'empty commit'
>>> +  saved backup bundle to
>>> $TESTTMP/grafted-dir-repo-clone/.hg/strip-backup/cb99d5717cea-de37743b-amend-backup.hg
>>> (glob)
>>
>>
>> I slightly simpler way of creating a commit with not manifest changes:
>>
>> $ hg branch foo
>> $ hg ci -m 'new branch'
>>
>>> +  $ hg log -r 'tip + tip^' -T '{manifest}\n'
>>> +  1:678d3574b88c
>>> +  1:678d3574b88c
>>> +  $ hg --config extensions.strip= strip -r . -q


More information about the Mercurial-devel mailing list