[PATCH] merge: add resolve --prep, which outputs conflicted comparison files

Mon Feb 27 22:04:22 EST 2017

On 2/27/17 5:59 PM, Jun Wu wrote:
> Excerpts from Durham Goode's message of 2017-02-27 17:39:10 -0800:
>>
>> On 2/27/17 4:58 PM, Jun Wu wrote:
>>> Excerpts from Durham Goode's message of 2017-02-27 16:37:56 -0800:
>>>> Do you mean outputting the actual file content in the json? I was
>>>> thinking of having just the paths to the file contents be in the json.
>>>> That way we don't have to json escape a bunch of binary data if we
>>>> output conflicting binary files (which is an eventual use case).
>>>
>>> Yes. Having the content available is better in my opinion:
>>>
>>>   1. No filesystem race condition
>>>   2. Stateless. No need to clean up files
>>>
>>> For binary files, we can skip them in some way. That's also one reason why
>>> the json schema is hard to define, and why I think Python merge functions is
>>> a cleaner solution.
>>
>> K, I'd be fine streaming the raw files as part of the json until we find
>> a reason not to (like huge conflicts or binary resolutions or something).
>>
>>>> For raw vs slice, I think having the four raw files (yours, mine, base,
>>>> output file which starts with conflict markers in it) makes the most
>>>> sense from the point of view of a merge tool consuming this data.  I
>>>> don't think we should require the consumer to parse the conflict marker
>>>> file.  I'm not even sure they could 100% correctly parse the file if the
>>>> file contained text that looked like conflict markers but was actually
>>>> part of the file.
>>>
>>> So you are suggesting raw files without markers? That makes sense from a
>>> cleaner API perspective. But it's less practical - it requires extra efforts
>>> from the upper-layer application to figure out what could be cleanly merged
>>> (other and local have the same change).
>>
>> I think the consumer application is usually going to be a merge tool
>> anyway, so I'm not too concerned with them figuring out the conflict.
>> But even so, the 'output' file I suggest would initially contain the
>> conflict markers so the tool could present that to the user if necessary.
>>
>>>
>>> I never suggested put conflict markers into file contents. The "slices"
>>> format is defined precisely by JSON - something similar to the output of
>>> google's diff-match-patch library [1], but with 3 sides.
>>>
>>> [1]: https://github.com/zchee/google-diff-match-patch/wiki/API#initialization
>>
>> I think starting with the normal other/local/base/output raw text is
>> probably sufficient for most resolve use cases, since that's enough for
>> the current merge tools.  If we come up with a use case, we can add
>> internal:dump-json-slices or something fancier.
>>
>> So, do you think it would be sufficient to make the following changes to
>> the current patch?
>>
>> 1. Remove --prep
>> 2. Add internal:dump-json as a tool that uses this code path.  'hg
>> resolve --tool internal:dump-json' doesn't actually resolve anything,
>> but outputs the json to stdout.
>
> Yes.
>
>> 3. Return raw file contents in the json instead of file paths. So a
>> format like:
>>
>> {
>> "conflicts": {
>>    "path/to/file1": {
>>      "base_content": "xxxx",
>>      "original_content": "yyyy",
>>      "other_content": "zzzz",
>>      "conflictmarker_content": "aaaa",
>
> Maybe without "conflictmarker_content" - the merge tool could have its own
> resolution state (like, entirely in the IDE memory) where the working copy
> state is meaningless to it.
>
>>    }
>> }
>
> I think the following metadata needs to be exposed for base, local, and
> other, respectively:
>
>   - executable bit
>   - symlink bit
>
> I'd arrange the JSON a bit differently so keys are fixed, and items are
> stored in an array:
>
>   [
>     {'base': {'path': ...,
>               'content': ...,
>               'isexec': ...,
>               'issymlink': ... },
>      'other': {'path': ...,
>                'content': ...
>                'isexec': ...,
>                'issymlink': ...},
>      'local': { .... },
>      'resolved': false },
>     {'base': ....,
>      'other': ....,
>      'local': ....,
>      'resolved': .... },
>     ....
>   ]
>
> In this way, it's easier to define the schema using json-schema.org
> specification [1] if we want it eventually, and the values of 'base',
> 'other', 'local' could be described using a single schema definition.

Sounds reasonable.  We probably need "isdirectory" as well, to detect 
file vs directory changes.  And some way of indicating the file doesn't 
exists in the case of change-vs-deleted conflicts. Maybe by having the 
dictionary be empty.  i.e. `"other" : {}`