[PATCH] obsolete: add operation metadata to rebase/amend/histedit obsmarkers

Pierre-Yves David pierre-yves.david at ens-lyon.org
Thu May 18 04:46:36 EDT 2017

TL;DR; There seems to be significant confusion remaining around the goal 
and constraint of what we aims to do here. The email discussion is 
moving slowly, I think we should switch to a more synchronous medium.

On 05/13/2017 11:16 PM, Gregory Szorc wrote:
> On Wed, May 10, 2017 at 1:08 PM, Pierre-Yves David
> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> wrote:
>     On 05/10/2017 05:45 PM, Durham Goode wrote:
>         On 5/10/17 3:40 AM, Pierre-Yves David wrote:
>             On 05/10/2017 01:34 AM, Durham Goode wrote:
>                 # HG changeset patch
>                 # User Durham Goode <durham at fb.com <mailto:durham at fb.com>>
>                 # Date 1494372571 25200
>                 #      Tue May 09 16:29:31 2017 -0700
>                 # Node ID 22051b0924bcbc5cde6d25d700dcce6afcaafd98
>                 # Parent  8f1a2b848b52ea7bf3fe2404e3b62924c7aae93f
>                 obsolete: add operation metadata to
>                 rebase/amend/histedit obsmarkers
>             TL;DR; Storing more information is useful, but the
>             pre-existing plans
>             aims at storing slightly different data in a different way.
>             We should
>             stick to it.
>             The idea is to use the obsmarker bitfields to store the data
>             and record
>             a pre-defined set of "effects" in each markers. (eg:
>             "content change,
>             message change, parent change")
>             For examples:
>             1) a `hg amend` that update the diff will create a markers
>             recording the
>             "content changes".
>             2) a `hg amend -e` that only update he commit message will
>             record the
>             "message change".
>             3) a `hg rebase` will record the "parent change".
>             4) a `hg amend` that update the message, the content and
>             user will
>             record these three things (content, message and meta changes).
>             These data can then be used to have nice template as shown
>             by Durham
>             (eg: is X is a successors of Y and the markers as some
>             "content change"
>             bit, we can display "amended as Y"; if their are some parent
>             change, we
>             can display "rebased as Y", if both are present "amended and
>             rebased as
>             Y").
>         While having the above data seems nice, I don't know if it
>         justifies not
>         going with the current simple approach today. Otherwise we have
>         to wait
>         for this large obsstore refactor, which seems like a rather big
>         dependency to block an important UX feature we could deliver today.
>     There seems to be a major misunderstanding here.
>     Obsmarkers -already- has a bitfield (for many years) so are
>     -already- able to use them to record actions. There are no delay or
>     blocker here everything is already in place for it. There are
>     basically no reason to not just do it.
> This is true. And nobody is denying that using bit flags is more optimal.
> However, since we're not sure what we're doing with this feature yet,
> defining strings in the open-ended metadata field and then optimizing
> common values as bit flags later (possibly before 4.3 release) seems
> like a reasonable approach.

Using bit is not just about optimizing size for common values, it is 
about recording more usable value too. Command name cannot be combined 
(something we know we need) and has trouble carrying information across 
users using different interfaces (something already happening today).

The planned alternative (recording "effect" on operation) does not 
suffer from these two issues and are better suited for providing good 
information for team using evolution.

>         When the bit field is added in the future we can always change our
>         templates and messaging to use that data as well.  And if it's good
>         enough, we can drop the 'operation' metadata entirely, with little
>         compatibility issue.
>             The above approach as multiple advantages:
>               Compact: Since we use the (existing) bitfield, the storage
>             cost is
>             free (or minimal if we need to adds an extra byte of bitfield).
>             Accessing the data is also very cheap:
>               Compoundable: We'll often needs to combine multiple
>             markers to reports
>             changesets between two revisions (eg: we do not have the
>             intermediate
>             revisions locally). the bit field approach makes it trivial
>             to compound
>             the information. We can display display the same final
>             information when
>             the same result if obtained from different path (eg: from
>             two markers
>             [content change; message change] or from one markers
>             [content change |
>             message change]).
>               This part is important because simply using operation name
>             makes it
>             quite hard to combine the information.
>         Yes, if there were multiple operations we'd have to union them
>         just as
>         operation='modified' or something. Not ideal, but should be an
>         outlier
>         case.
>     Having multiple markers between local changesets is the common case
>     once people start exchanging draft changesets. We can not dismiss
>     this as an outliers, behaving properly here is an actual request
>     from the current users.
> I don't think this matters today. Perfect is the enemy of good. We can
> also hack around this other ways, such as having the presence of
> multiple markers obsoleting a specific changeset via different methods
> change the UI.

I'm a bit confused by this part of your reply, I'm not sure how to read 
it. Can you try to rephrase it to clarify again?

We know from field data and users request that it common to have a 
linear chain of markers between locally known changeset (chain involving 
locally unknown node). And team using evolution do requests better 
reporting in this case.

However, the part in your reply about "presence of multiple markers 
obsoleting a specific changeset" makes me think you are actually talking 
about something else.

>               UI Abstraction: since we record the "effect" information,
>             we abstract
>             the command names. This makes use more agile about the
>             actual UI. User
>             can use their own "hg myownrewritetool" command and still record
>             information useful for the other users.
>         I think recording the actual command the user ran is important.
>         The goal
>         of showing this in the UI is to let the user know what past action
>         caused this commit to become a new commit, so if they run 'hg
>         flabbernate' it's much more useful to show them 'fblabbernated
>         to be X'
>         than it is to show them 'amended to be X'. So potentially, even
>         if we
>         have the bitfield data, it could still be important to record
>         the actual
>         command that caused this obsmarker to be created (ex: "amended
>         to be X
>         by hg flabbernate")
>     Having more information is always useful, but this looks like
>     something to add as a second step.
>     The action-bit approach has clear advantages (no size concerns,
>     clear behavior with marker compositions), so this appears as a clear
>     good first step here.
>     In addition, if that first step turn out to not be enough and we
>     want more precise data we might want to look at leveraging the "hg
>     journal" local data for this.
>     I would recommend the following step forward:
>     1) add a simple "rewritten as" template,
>     2) record action-bit, use it in the template (and other related output),
>     3) gather data and look at the best way to issue more details.
> Since we're not sure what we're going to do yet, wasting limited bit
> flags when we have an open-ended metadata field is premature
> optimization. I support the approach in this patch. I also support
> converting things to bit flags before 4.3 if we establish confidence
> that the feature is working how we'd like.

We have to keep in mind that the total size of markers is an important 
factor here. There are already useful information excluded from markers 
for size reason (eg: parent information for non-prune). We have to keep 
being careful here and weight what extra data is the most important. 
Adding command name would increase markers size by tens of percent. 
Especially since markers will live forever in the project once created.

If exhausting bit flag is a concern, we can easily extend this bit 
field. For now, we could store this extension in the metadata mapping, 
and we would store them as an extra byte in the next obsstore format 
(that we'll need soon at least for hash agility anyway).
We could easily dedicate a whole byte to store the proposed action bit 
(and store it the marker extra for now, that would help adjusting bit 
value over time).


Pierre-Yves David

More information about the Mercurial-devel mailing list