Desired use case for obsmarkers / visibility

Tue Nov 14 12:10:28 EST 2017

On Mon, 2017-11-13 at 09:03 -0800, Gregory Szorc wrote:
> On Fri, Nov 3, 2017 at 12:34 PM, Boris Feld <boris.feld at octobus.net>
> wrote:
> > On Thu, 2017-11-02 at 10:06 -0700, Gregory Szorc wrote:
> > > I have a potential use case for obsmarkers / visibility that I
> > want
> > > to run by people to see if it can be supported.
> > >
> > > Changesets are pushed to the Firefox repo via a landing service.
> > This
> > > service essentially imports changesets [submitted by the author]
> > and
> > > rebases them onto the repo head.
> > >
> > > Today, when your changesets are landed, you need to perform
> > "garbage
> > > collection" on your local repo to remove the old versions of
> > > changesets. We want landed changesets to disappear after `hg
> > pull`
> > > picks up the rebased versions.
> > >
> > > This is a pretty straightforward scenario and is supported by
> > > obsmarkers today. If we enabled the writing of obsolescence
> > markers
> > > in the landing service, things would essentially "just work."
> > >
> > > Here's where things get a little more complicated.
> > >
> > > When changesets are landed to the Firefox repo today, they are
> > first
> > > pushed to an "integration" repository. Logically, this can be
> > modeled
> > > as a single repo divided into public and draft parts. e.g.
> > >
> > > o D (draft) (head)
> > > o C (draft)
> > > o B (public)
> > > o A (public) (root)
> > >
> > > When our CI says a changeset is "good," it is promoted to public.
> > > e.g.
> > >
> > > o D (draft)
> > > o C (public) (formerly draft)
> > > o B (public)
> > > o A (public) (root)
> > >
> > > Today, when we encounter a "bad" changeset, we perform a backout.
> > > e.g.
> > >
> > > o D' (draft) (backout of D)
> > > o D (draft)
> > > o C (public)
> > > o B (public)
> > > o A (public) (root)
> > >
> > > Given our push velocity, it is common to have intermediary
> > changesets
> > > land before a changeset is identified as "bad." This means there
> > are
> > > changesets between the initial landings and its backout. e.g.
> > >
> > > o D' (draft) (backout of D)
> > > o E (draft)
> > > o D (draft)
> > > o C (public)
> > > o B (public)
> > > o A (public) (root)
> > >
> > > The repo with the backouts is eventually published and the final
> > > history of the repo is littered with "bad" changesets and
> > backouts.
> > > This causes all kinds of problems for bisection, annotate, file
> > > history, etc.
> > >
> > > Instead of performing backouts and leaving the final repo history
> > in
> > > a sub-optimal state, we want to instead "drop" "bad" changesets
> > > before they are published. e.g.
> > >
> > > o E' (draft) (rebased from discarded D to C)
> > > |     x D (draft) (discarded)
> > > o C (public)
> > > o B (public)
> > > o A (public) (root)
> > >
> > > Since we can identify "bad" changesets relatively quickly, this
> > would
> > > enable us to remove the vast majority of backouts and "bad"
> > > changesets from the final, published repo history.
> > >
> > > Again, obsolescence as it exists today facilitates this. We can
> > > perform these drops via `hg histedit` (or similar) and the
> > > appropriate "prune" obsmarkers are written so the canonical repo
> > has
> > > the appropriate final history.
> > >
> > > However, the way it works today isn't friendly to end-user
> > workflows.
> > >
> > > If we were to deploy this, the following would happen:
> > >
> > > 1) User creates changeset X and submits for landing.
> > > 2) Landing service rebases to X' and writes X->X' marker.
> > > 3) X' turns out to be bad and is dropped. X'->null marker is
> > written
> > > to convey the prune.
> > > 4) User pulls and sees X->X'->null and hides X because its most
> > > recent successor is pruned.
> > > 5) User is left wondering what happened to X. They possibly
> > forget
> > > they need to fix and reland X.
> > >
> > > This is bad UX. What we want to happen instead is:
> > >
> > > a) User pulls after X' drop and X is still visible.
> > > b) Something else happens and some form of X remains
> > > visible/accessible to user
> > >
> > > The server can't expose X' because everyone would see it. We have
> > 1
> > > head per repo and don't want to be exposing random "bad"
> > changesets
> > > to everyone. This seems to rule out the traditional evolve
> > solution
> > > of "touch" a changeset to revive a prune because I'm not sure how
> > > we'd send X' to only the user that cares about it. There's also
> > no
> > > way in obsolescence today to unhide X once it has been obsoleted.
> > >
> > > In the obsmarker world of today, the best solution I can think of
> > is
> > > "delete obsmarkers on the server." If we discarded the X->X'
> > marker
> > > (or didn't write it until X' became public), the end-user's
> > original
> > > changeset X wouldn't be hidden on pull because there is no marker
> > on
> > > the server referencing X. But this approach feels hacky and is
> > extra
> > > server-side complexity, which I'd prefer to avoid.
> > 
> > First, `hg strip` get ride of X' and the obsmarkers for you, but
> > that
> > is a more hacky and traumatic for the repository that you will
> > want 
> > (especially the caches).
> > 
> 
> We have a strong desire to avoid strip. We have a writable master
> server and separate read-only mirrors. Performing a strip means
> having to replicate that strip. And when the repo is being stripped,
> recent changesets may not be available in the revlog. This would
> almost certainly cause intermittent failures in CI since Mercurial
> doesn't have read locks. We'd have to mark mirrors as offline when
> they are doing strips or resort to some other hackery. It would be
> ugly.
>  
> > Fortunately there are a couple of other low tech solutions
> > available 
> > with today implementation:
> > 
> > 
> > For your usecase. If people are barely pulling from the
> > integration 
> > repository, the simplest might be to turn the dropped changeset
> > secret 
> > (instead of pruning it). That way, they are no longer exchanged
> > (nor
> > is the associated obsolescence markers between X and X').
> > 
> > This is a simple approach available today. However, that won't
> > help 
> > people who already pulled from -integration.
> 
> Regarding people who have already pulled from the integration repo,
> assume it is rare for people to do this. If the original author pulls
> while X' is still visible, I'm fine with that original author having
> to revive X manually. I care mostly about the common case where the
> original author doesn't pull from the integration repo.
>  

I quickly mentioned a secret-based workflow, here is how it would work:

    - You create a stack to test with A, B, and C.
    - B proves to be bad, C is rebased on top of A (and create an
obsmarker).
    - B is made secret. Marking it as secret will have the effect that
this changeset and attached obs-markers will not be shared anymore.
    - Publish A and C.

The main challenge for this approach is propagating the secret phases
to the read-only mirrors (if they already have the changeset as draft).
Is the replication done using simple pull, or is that using your kafka
based bundle replication system?
* If pulled based, you could have a hook turning outgoing changeset
secret.
* If kafka+bundle based, you need a tiny extension to carry forced
phase move to secret.

> > 
> > To have X' disappear from other people repository while still being
> > nice to the original author, you can use an extra "-rejected"
> > repository.
> > 
> >   0) user pushed X⁰, for rebased as X¹ but need to be dropped,
> >   1) Push X¹ to mozilla-rejected,
> >   2) 'touch' X¹ into X² (inside mozilla-rejected),
> >   3) prune X¹ in mozilla-integration,
> >   4) send an email to the user with the reason for the rejection
> > and
> > the 
> > url to pull X' again,
> > 
> > Step (3) means there will be a prune marker to be pulled by
> > everybody 
> > (from integration). But step (2) ensures there is a successor that
> > the 
> > original user can use to keep working.
> > 
> > I agree it is not the most elegant solution, but that is easily 
> > available today.
> > 
> 
> This doesn't seem that bad. We would end up with a -rejected repo
> containing tons of heads. As long as we told the user exactly which
> head to pull, they'd only receive the rewrites to the relevant
> changesets.
> 
> However, this solution still results in the original author "losing"
> X by default (which is the main problem I'm trying to prevent). It is
> nice they can get X' back by pulling from the rejected repo. But they
> need to explicitly do that, which is no sure thing.
>  
> > As an extra (5) steps, -rejected could detect any content-
> > divergent 
> > version of something that got dropped. The new successors would be
> > the 
> > next version from the legitimate owner. An obsmarkers can then be 
> > automatically created from X¹ to that newer version of X³.
> > 
> > >
> > > I /think/ the new visibility work proposed by Jun, Durham, and
> > others
> > > might offer some solutions to this problem. Rather than speculate
> > > based on my limited knowledge of that proposal, I am hoping
> > someone
> > > with more knowledge could weigh in more definitively.
> > >
> > > It's worth noting that in our proposed workflow, the
> > "integration"
> > > changesets that are rewritten exist in a separate repository that
> > > most people don't pull from. This means we could potentially
> > break
> > > some "rules" about how obsmarkers work since few would notice.
> > But we
> > > do pull the "integration" repo into the "stable" repo. So
> > presumably
> > > obsmarkers would propagate to the "stable" repo and be pulled by
> > > people, where they could cause problems.
> > >
> > > Is there a solution to this use case? FWIW, I think a solution
> > would
> > > have use beyond Mozilla's walls: I'm pretty sure a lot of people
> > > would love for the final history of their repo to be cleaner.
> > It's
> > > just that today's VCS tools (including Git), don't handle
> > distributed
> > > history rewriting very gracefully, which discourages people from
> > > having nice things.
> > 
> > 
> > I hope one of the low tech solution offered above solves your
> > immediate problem to let you move forward with your idea.
> > 
> > Getting something better than the low-tech solution offered above
> > is 
> > going to be a bit harder. The main issues is that we want to
> > actually 
> > hide the dropped changeset for all people but some (the
> > author/owner). 
> > The definition of that "owner" is not something we have really
> > clarified yet.
> > 
> > If we clarifies this ownership things. There are two different 
> > approaches we can explore in your case.
> > 
> > Server side solution:
> > 
> >      Variant of the 'secret' approach. The X² changeset is not
> > served
> > to people unless they are recognized as "owning it" during the
> > discovery phases.
> > 
> 
> This requires some kind of auth with the server. Since our servers
> all do anonymous access, that won't fly.
>  
> > Client side solution:
> > 
> >      Changeset owned by the local repository, but pruned by a non-
> > owner don't get hidden until something happens (owner actually
> > obsolete
> > them or something; The non-owner pruning could be a flag in the
> > obsmarkers set at pruning time).
> 
> This sounds like an interesting avenue to pursue. I'm trying to come
> up with scenarios where someone would actually want their local
> changesets to be pruned (read: deleted) by another party (at least in
> their local view). I could see someone cleaning up cruft from another
> person on a shared repo. But if I'm the author, I really don't like
> things disappearing on me without my consent.
> 
> Also, I don't think we would need an explicit flag to track this.
> Obsmarkers have owners. So if the changeset's author doesn't match
> the prune obsmarker's owner, we have a non-author prune. Could the
> changeset's author then write a (redundant) prune obsmarker to hide
> it locally?
> 
> I haven't thought through the full implications of this idea. But it
> definitely sounds interesting - even outside the context of this
> thread!
> 

Using the changeset's author and obsmarker's author to match the owner
will be problematic in some cases. From actual field testing, we have
seen multiple cases where the "owner" does not match the actual
changeset author.

    - Developer with multiple addresses. eg: we have tons of these in
Mercurial itself (home vs work address).
    - Teams with one or two members comfortable with Mercurial, fixing
situation for the others. They would be the one cleaning up other
people changesets in the repository.
    - Workflow where the "owner" of a changeset's is changing over time
as it progress toward integration.

As a result, we think that having a flag to mark -non-author- prune
would be more flexible. We can start using the changeset ownership to
decide whether it should be set or not, but having the flag will allow
us to easily move to a wider logic as we go.
It would also help with containing possible performance problems,
decoding and reading all obsmarkers owners to compute visibility would
not be cheap.