Linkrev perf

Fri Sep 30 12:22:11 EDT 2016

Excerpts from Durham Goode's message of 2016-09-30 08:31:17 -0700:
> On 9/30/16 8:20 AM, Jun Wu wrote:
> > Suppose the repo has linear history, the only reason we need .d walk is
> > because of obsoleted changesets.
> I don't think we can assume the repo is linear though.  Even for a repo 
> without merges, if a user performs a graft onto a release branch, that 

note: I should have added "single head" as a requirement of being "linear". 
But the approach should be helpful even if the repo is not linear, see
below.

> can result in duplicate linkrevs without any obs markers being created. 
> (and at that point we can't blindly rewrite the linkrev if a future obs 
> marker comes along, because we don't know what other commits have file 
> nodes that use that linkrev).

If we do nothing when obs markers are being created and filerev points to
obsoleted changesets, we will *always* fallback to .d walk later. This
change will reduce the chance from "always" to "maybe".

The graft case is a valid concern. But it could be detected easily: only
rewrite the linkrev to the successor, if the successor modifies (introduces)
the file. That's just a quick "if file in changelog[succ].files" test.
Otherwise we can either do a .d walk (should be cheaper than usually as we
expect the first changeset modifying the file will match, if the history is
linear) or do not rewrite the linkrev this time for better perf (i.e.
assuming the graft case happens much less common then the obsolete case).

> > Consider the simple amend case, where 6 is replaced by 7:
> >
> >    o    7 : filerev=1
> >    |
> >    | x  6 : filerev=1        o  1 : linkrev=6
> >    |/                        |
> >    o    5 : filerev=0        o  0 : linkrev=5
> >    changelog                 filelog
> >
> > It's obvious that if we rewrite linkrev from 6 to 7 in this case, we no
> > longer need .d walk, since the new linkrev will be in the visible branch.
> >
> > Hence I suggest rewrite linkrev when creating an obsolete marker: if the
> > linkrev was pointed to a obsoleted changeset, and one of the successor
> > has a same filerev, rewrite the linkrev to the earliest successor with a
> > same filerev.
> Rewrite it where? In the revlog?  That sounds dangerous.  Other 
> algorithms currently depend on the linkrev pointing at the earliest 
> instance, so I don't think we'd want to break that invariant entirely.

Yeah, if the revlog must be strictly "append-only", we need to find a
different place to write the info (like an extra map, but that increases the
complexity significantly).

That's also why I mentioned "asynchronously", let a "fsck"-like "cleanup"
program do the job. In fact, since it's just touching 4 bytes of a file
without moving content around, I think it should be much safer then things
like "strip". 

In general, the high-level ideas here are:
1. linkrev points to a hidden revision is probably always not what we want
2. if we can know a better "linkrev" easily, we should record it one
   somewhere.
3. the better "linkrev" in these cases are good enough for linear repos.
4. it seems hooking the "obsolete marker creation" is brilliant and handles
   a lot of real world cases, like it fits well with "pullcreatemarkers".

Whether we do it directly at revlog or not is implementation details. I'd
prefer changing revlog since it's simple, strightforward and fits with other
parts well.