[PATCH 1 of 2 v9] graft: support grafting across move/copy (issue4028)

Mon Oct 3 11:16:07 EDT 2016

On Mon, 3 Oct 2016 10:28:37 +0000, Gábor STEFANIK wrote:
> > > +    _c1 = c1.p1() if c1.rev() is None else c1
> > > +    _c2 = c2.p1() if c2.rev() is None else c2
> > > +    dirty_c1 = not (ca == _c1 or ca.descendant(_c1))
> > > +    dirty_c2 = not (ca == _c2 or ca.descendant(_c2))
> > > +    graft = dirty_c1 or dirty_c2
> > > +    if graft:
> > > +        cta = _c1.ancestor(_c2)
> >
> > Can you fix ctx.descendant() to handle wctx? or maybe this could be
> >
> >   dirtyc<n> = ca != ca.ancestor(c<n>)  # for n = 1, 2
> >
> > and one of ca.ancestor(c<n>) would be cta. (I don't carefully investigate criss-
> > cross merge case, so I might be wrong.)
> 
> ctx.ancestor() always returns just one common ancestor, not all possible candidates.

Yes. My guess was ca.ancestor(c1) (not c1.ancestor(c2)) would have only one
common ancestor if it is ca, but I didn't think that deeply.

> So unfortunately there is no way to eliminate the use of descendant() here, nor do I feel
> that there is a need to do so. descendant() is still faster than ancestor() or especially
> commonancestorsheads().

Anyway, if descendant() is faster than ancestor(), no need to take a slow path.

> > > -    u1, u2 = _computenonoverlap(repo, c1, c2, addedinm1, addedinm2)
> > > +    _u1, _u2 = _computenonoverlap(repo, c1, c2, addedinm1, addedinm2)
> >
> > Can you give them better names? '_' prefix in local variables generally means
> > they are placeholder (unused output) variables. (Perhaps we can start with a
> > trivial patch that separates u1/2 and _u1/2, which will describe why we need
> > them.)
> 
> u1r, u2r, u1u, u2u (for rotated/unrotated)? I'm wary of using longer names, as many lines
> in copies.py are already borderline for the 80-character rule, and will require reformatting
> with the introduction of longer variable names.

Shorter names seem good so long as they are consistent.

> > And it would be nice if there's a brief comment when to use ca/ma over
> > cta/mta.
> >
> > > -    seen = set([f])
> > > -    for oc in getfctx(f, m1[f]).ancestors():
> > > +    seen = {f: [getfctx(f, m1[f])]}
> > > +    for oc in seen[f][0].ancestors():
> > >          ocr = oc.linkrev()
> > >          of = oc.path()
> > >          if of in seen:
> > > +            seen[of].append(oc)
> > >              # check limit late - grab last rename before
> > >              if ocr < limit:
> > >                  break
> > >              continue
> > > -        seen.add(of)
> > > +        seen[of] = [oc]
> >
> > nit: maybe we only need a dict of file nodes, not a dict of file contexts?
> 
> how do I get a file node id from a file context? ancestors() gives us contexts.

fctx.filenode()

See also basefilectx.__eq__() to make sure if using .filenode() is appropriate
compared to the current code.

> > > -        fullcopy[f] = of # remember for dir rename detection
> > > +        # remember for dir rename detection
> > > +        if backwards:
> > > +            fullcopy[of] = f # grafting backwards through renames
> > > +        else:
> > > +            fullcopy[f] = of
> > >          if of not in m2:
> > >              continue # no match, keep looking
> > >          if m2[of] == ma.get(of):
> > > -            break # no merge needed, quit early
> > > +            return # no merge needed, quit early
> >
> > So we no longer set diverge[of] = [f] this case. I don't know if it was
> > necessary to populate 'renamedelete', but this change seems good for a
> > separate patch to make sure it never break anything.
> 
> This isn't just some optimization, it's necessary for correct behavior of the
> new checkcopies. And it would break the old checkcopies if applied separately
> before the main patch.

Can you add a test that should fail if we only do s/break/return/ ?
Unfortunately, all tests passed if I tried that.

> Is that really necessary? It's a lot of additional work for me, to find intermediate states that may work,
> and fix each of them so they don't break anything in the testsuite, only for those fixes to be reverted
> by the next patch in the series. It would be the digital equipment of digging and filling pits in the ground.
> 
> Splitting isn't the hard part; ensuring no breakage in the intermediate states is.

If that requires lots of unwanted intermediate states, that wouldn't be what
we should take. I guess this patch could be split to a) a few trivial changes
plus b) one (or two) core change. Hopefully (a) will reduce noises in (b).

(I'm not skilled around this code, so please feel free to ignore my comments
if they seem wrong.)