[PATCH] merge: add resolve --prep, which outputs conflicted comparison files

Mon Feb 27 20:59:28 EST 2017

Excerpts from Durham Goode's message of 2017-02-27 17:39:10 -0800:
> 
> On 2/27/17 4:58 PM, Jun Wu wrote:
> > Excerpts from Durham Goode's message of 2017-02-27 16:37:56 -0800:
> >> Do you mean outputting the actual file content in the json? I was
> >> thinking of having just the paths to the file contents be in the json.
> >> That way we don't have to json escape a bunch of binary data if we
> >> output conflicting binary files (which is an eventual use case).
> >
> > Yes. Having the content available is better in my opinion:
> >
> >   1. No filesystem race condition
> >   2. Stateless. No need to clean up files
> >
> > For binary files, we can skip them in some way. That's also one reason why
> > the json schema is hard to define, and why I think Python merge functions is
> > a cleaner solution.
> 
> K, I'd be fine streaming the raw files as part of the json until we find 
> a reason not to (like huge conflicts or binary resolutions or something).
> 
> >> For raw vs slice, I think having the four raw files (yours, mine, base,
> >> output file which starts with conflict markers in it) makes the most
> >> sense from the point of view of a merge tool consuming this data.  I
> >> don't think we should require the consumer to parse the conflict marker
> >> file.  I'm not even sure they could 100% correctly parse the file if the
> >> file contained text that looked like conflict markers but was actually
> >> part of the file.
> >
> > So you are suggesting raw files without markers? That makes sense from a
> > cleaner API perspective. But it's less practical - it requires extra efforts
> > from the upper-layer application to figure out what could be cleanly merged
> > (other and local have the same change).
> 
> I think the consumer application is usually going to be a merge tool 
> anyway, so I'm not too concerned with them figuring out the conflict. 
> But even so, the 'output' file I suggest would initially contain the 
> conflict markers so the tool could present that to the user if necessary.
> 
> >
> > I never suggested put conflict markers into file contents. The "slices"
> > format is defined precisely by JSON - something similar to the output of
> > google's diff-match-patch library [1], but with 3 sides.
> >
> > [1]: https://github.com/zchee/google-diff-match-patch/wiki/API#initialization
> 
> I think starting with the normal other/local/base/output raw text is 
> probably sufficient for most resolve use cases, since that's enough for 
> the current merge tools.  If we come up with a use case, we can add 
> internal:dump-json-slices or something fancier.
> 
> So, do you think it would be sufficient to make the following changes to 
> the current patch?
> 
> 1. Remove --prep
> 2. Add internal:dump-json as a tool that uses this code path.  'hg 
> resolve --tool internal:dump-json' doesn't actually resolve anything, 
> but outputs the json to stdout.

Yes.

> 3. Return raw file contents in the json instead of file paths. So a 
> format like:
> 
> {
> "conflicts": {
>    "path/to/file1": {
>      "base_content": "xxxx",
>      "original_content": "yyyy",
>      "other_content": "zzzz",
>      "conflictmarker_content": "aaaa",

Maybe without "conflictmarker_content" - the merge tool could have its own
resolution state (like, entirely in the IDE memory) where the working copy
state is meaningless to it.

>    }
> }

I think the following metadata needs to be exposed for base, local, and
other, respectively:

  - executable bit
  - symlink bit

I'd arrange the JSON a bit differently so keys are fixed, and items are
stored in an array:

  [
    {'base': {'path': ...,
              'content': ...,
              'isexec': ...,
              'issymlink': ... },
     'other': {'path': ...,
               'content': ...
               'isexec': ...,
               'issymlink': ...},
     'local': { .... },
     'resolved': false },
    {'base': ....,
     'other': ....,
     'local': ....,
     'resolved': .... },
    ....
  ]

In this way, it's easier to define the schema using json-schema.org
specification [1] if we want it eventually, and the values of 'base',
'other', 'local' could be described using a single schema definition.

[1]: http://json-schema.org/

> That gives us some room to add file paths later with different key names 
> if necessary.  It also renames "workingcopy" to be 
> "conflictmarker_content" since it's now raw content. The caller can know 
> where to write it's results based on the "path/to/file1".