[PATCH 1 of 2 v8] graft: support grafting across move/copy (issue4028)

Tue Sep 13 09:37:07 EDT 2016

>

--------------------------------------------------------------------------
This message, including its attachments, is confidential. For more information please read NNG's email policy here:
http://www.nng.com/emailpolicy/
By responding to this email you accept the email policy.

-----Original Message-----
> From: Pierre-Yves David [mailto:pierre-yves.david at ens-lyon.org]
> Sent: Tuesday, September 13, 2016 11:42 AM
> To: Gábor STEFANIK <Gabor.STEFANIK at nng.com>; Yuya Nishihara
> <yuya at tcha.org>
> Cc: mercurial-devel at mercurial-scm.org
> Subject: Re: [PATCH 1 of 2 v8] graft: support grafting across move/copy
> (issue4028)
>
>
>
> On 09/12/2016 12:13 PM, Gábor STEFANIK wrote:
> > -----Original Message-----
> >> From: Yuya Nishihara [mailto:youjah at gmail.com] On Behalf Of Yuya
> >> Nishihara
> >> Sent: Sunday, September 11, 2016 6:29 PM
> >> To: Gábor STEFANIK <Gabor.STEFANIK at nng.com>
> >> Cc: mercurial-devel at mercurial-scm.org
> >> Subject: Re: [PATCH 1 of 2 v8] graft: support grafting across
> >> move/copy
> >> (issue4028)
> >>
> >> On Fri, 26 Aug 2016 11:16:31 -0500, Gábor Stefanik wrote:
> >>> # HG changeset patch
> >>> # User Gábor Stefanik <gabor.stefanik at nng.com> # Date 1472225958 -
> 7200
> >>> #      Fri Aug 26 17:39:18 2016 +0200
> >>> # Node ID f32aa28298164aa38830e83369c57c9553c6ff08
> >>> # Parent  318e2b600b80e4ed3c6f37df46ec7544f60d4c0b
> >>> graft: support grafting across move/copy (issue4028)
> >>
> >> Sorry for late reply, and I don't fully understand how this patch
> >> handles reverse copy tracking correctly yet. But weekend is over, so
> >> I just wrote random comments.
> >>
> >>> +    # In certain scenarios (e.g. graft, update or rebase), ca can
> >>> + be
> >> overridden
> >>> +    # We still need to know a real common ancestor in this case
> >>> +    # We can't just compute _c1.ancestor(_c2) and compare it to ca,
> >> because
> >>> +    # there can be multiple common ancestors, e.g. in case of bidmerge.
> >>> +    cta = ca
> >>> +    # ca.descendant(wc) and ca.descendant(ca) are False, work around
> that
> >>> +    _c1 = c1.p1() if c1.rev() is None else c1
> >>> +    _c2 = c2.p1() if c2.rev() is None else c2
> >>> +    dirty_c1 = not (ca == _c1 or ca.descendant(_c1))
> >>> +    dirty_c2 = not (ca == _c2 or ca.descendant(_c2))
> >>> +    graft = dirty_c1 or dirty_c2
> >>> +    if graft:
> >>> +        cta = _c1.ancestor(_c2)
> >>
> >> Can we know if we're doing graft-type merge beforehand? I think
> >> ca.descendant() isn't fast so it should be avoided for normal merges.
> >
> > This has been argued repeatedly. Basically the only way you can know in
> advance that your merge is going to be graftlike is by  doing a few
> descendant() calls yourself. So, with the exception of the "hg merge"
> command (which is guaranteed to yield ungraftlike merges), all commands
> wishing to do a merge will have to run through this whole descendant game.
> >
> > In the first few versions of the patches, I actually did graftlikeness
> detection in mergemod.graft(), but it was a nightmare to get it to work
> properly, and as it turns out, graft() isn't the only thing doing a graftlike
> merge. I was informed that calling descendant() once in a command is fine,
> it's only too slow for calling in a loop.
> >
> > I don't think complicating the code, potentially missing some edge cases of
> unusual commands doing graftlike merges, and breaking extension
> compatibility is warranted for a few ms speedup of "hg merge".
>
> Given how often you had to explain this, it is probably worth adding
> something about this in the inline comment about it.

Will do, for the sake of posterity.

The actual discussion I believe happened on either the mailing list or on IRC.
I was specifically assured that a small constant number of descendant() calls is OK.

>
> >>>      # find interesting file sets from manifests
> >>> +    if graft:
> >>> +        repo.ui.debug("  computing unmatched files in rotated
> >>> + DAG\n")
> >>>      addedinm1 = m1.filesnotin(ma)
> >>>      addedinm2 = m2.filesnotin(ma)
> >>> -    u1, u2 = _computenonoverlap(repo, c1, c2, addedinm1, addedinm2)
> >>> +    _u1, _u2 = _computenonoverlap(repo, c1, c2, addedinm1,
> addedinm2)
> >>> +    if not graft:
> >>> +        u1, u2 = _u1, _u2
> >>> +    else: # need to recompute this for directory move handling when
> >> grafting
> >>> +        repo.ui.debug("  computing unmatched files in unrotated DAG\n")
> >>> +        u1, u2 = _computenonoverlap(repo, c1, c2, m1.filesnotin(mta),
> >>> +
> >>> + m2.filesnotin(mta))
> >>> +
> >>>      bothnew = sorted(addedinm1 & addedinm2)
> >>>
> >>>      for f in u1:
> >>> -        checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1, fullcopy1)
> >>> +        checkcopies(c1, f, m1, m2, ca, cta, dirty_c1, limit, diverge, copy1,
> >>> +                    fullcopy1, incomplete1, incompletediverge)
> >>>
> >>>      for f in u2:
> >>> -        checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2, fullcopy2)
> >>> +        checkcopies(c2, f, m2, m1, ca, cta, dirty_c2, limit, diverge, copy2,
> >>> +                    fullcopy2, incomplete2, incompletediverge)
> >>
> >> [snip]
> >>
> >>> +    # combine partial copy paths discovered in the previous step
> >>> +    copyfrom, copyto = incomplete1, incomplete2
> >>> +    if dirty_c1:
> >>> +        copyfrom, copyto = incomplete2, incomplete1
> >>> +    for f in copyfrom:
> >>> +        if f in copyto:
> >>> +            copy[copyto[f]] = copyfrom[f]
> >>> +            del copyto[f]
> >>> +    for f in incompletediverge:
> >>> +        assert f not in diverge
> >>> +        ic = incompletediverge[f]
> >>> +        if ic[0] in copyto:
> >>> +            diverge[f] = [copyto[ic[0]], ic[1]]
> >>
> >> According to Matt's comment, we need two copy traces split at 'ca',
> >> but we use ones split at 'cta' (and somewhat 'ca' is taken into
> >> account?), because it wouldn't be easy to track copies backwards.
> >>
> >> https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-
> >> August/086915.html
> >>
> >> I guess that are addressed these "incomplete" dicts and "if"s in
> >> checkcopies(), but I wonder if there are unhandled cases such as
> >> non-linear DAG to be rotated, which might include "ypoc". I'm not sure,
> though.
> >
> > See the tests. Virtually every imaginable case is tested.
> >
> > Matt actually wanted 3 copy traces, one between "cta" and "ca", one from
> one parent to "ca", and one from the other parent to "cta". The problem
> with this approach is that checkcopies can't just stop after going "behind"
> some cutoff revision, since it's operating in a low-level way in which "behind"
> doesn't really make sense. This is presumably for perf reasons. As a result,
> the checkcopies pass going from a parent to "ca" will actually go back to "cta",
> and find spurious copies. We could perhaps identify and remove those
> spurious copies by comparing the output to that of the ca->cta pass, but then
> we would need post-processing as complex as what this patch has to
> accomplish that, so we win nothing by going 3-pass.
>
> Okay this is the first time I read bout this 'behind' thing, I need to investigate
> that but that should probably extensibility documented as a comment.

The existing comment (and the signature of the function) implies this heavily, as there
is actually no parameter provided for identifying a "bottom" revision.

The ca parameter doesn't serve as a stopping point, because the common ancestor's
manifest may actually contain a file from a much earlier revision. Same for the new cta.

There is a limit parameter, but it uses integer arithmetic on revision numbers, which
don't work well in a rotated DAG. Also, limit is apparently only a performance optimization,
and it provides no guarantees that it won't visit "uninteresting" revisions. This is because
in the 2-pass scenario, visiting an uninteresting revision just wastes time, but with 3 passes,
it's possible to accidentally visit a revision that would be uninteresting in the current pass,
but interesting in another, in which case copies in that revision will be recorded in the
wrong direction. Getting rid of that requires either rewriting the limit logic completely
to eliminate the usage of integer arithmetic (potentially severely slowing down rename detection),
or adding filter logic to get rid of these spurious reversed copies, which would be more complex
than the current filtering and combination logic for 2-pass.

Maybe document that limit is only an optimization, and doesn't provide guarantees?

>
> --
> Pierre-Yves David