[PATCH 1 of 4 V3] filemerge: add `summarize()`

Mon Mar 20 06:43:30 EDT 2017

A bunch of comments inline. One more round should do it I think.

On 3/17/17 2:17 AM, Phil Cohen wrote:
> # HG changeset patch
> # User Phil Cohen <phillco at fb.com>
> # Date 1489716177 25200
> #      Thu Mar 16 19:02:57 2017 -0700
> # Node ID ad81d182e88ffd3da20364f18204facf6e7750a3
> # Parent  568d80b24b3a3e4e6874ac0d95ddee056fef14e4
> filemerge: add `summarize()`
>
> returns a dictionary of each version of a conflict, given the contexts

While the first line should be all lowercase, I don't think that holds 
for the following paragraph. See 
https://www.mercurial-scm.org/wiki/ContributingChanges#Example_patch for 
an example. Note how the description after the first line uses full 
sentences, capitalization, and punctuation.

For a patch that adds a function that isn't used yet, it is customary to 
explain how you are planning to use it. It doesn't have to be long. 
Something like "future patches will use this function to build 
machine-readable batch merge status output" might work.
>
> diff --git a/mercurial/filemerge.py b/mercurial/filemerge.py
> --- a/mercurial/filemerge.py
> +++ b/mercurial/filemerge.py
> @@ -7,6 +7,7 @@
>   
>   from __future__ import absolute_import
>   
> +import copy
>   import filecmp
>   import os
>   import re
> @@ -567,6 +568,60 @@
>           "o": " [%s]" % labels[1],
>       }
>   
> +def summarize(repo, workingfilectx, otherctx, basectx):

Excellent names!

> +    origfile = None if workingfilectx.isabsent() else \
> +        scmutil.origpath(repo.ui, repo, repo.wjoin(workingfilectx.path()))

I haven't seen the \ used for line continuation in the hg codebase 
before, so I did a quick grep of the codebase and discovered that it's 
used in none of the main python source files (only in setup.py of the 
python files I grepped).

I'd suggest using parens to do the continuation:

     origfile = (None if workingfilectx.isabsent() else
         scmutil.origpath(repo.ui, repo, repo.wjoin(workingfilectx.path())))

> +
> +    def flags(context):
> +        if isinstance(context, absentfilectx):
> +            return {'exists': False}
> +        return {
> +            'contents': context.data(),
> +            'exists': True,
> +            'isexec': context.isexec(),
> +            'issymlink': context.islink(),
> +        }

The way this is done, users of this API will need to check exists all 
the time or risk KeyErrors -- this will lead to ugly user-facing bugs in 
implementations that use this API. Instead, I'd suggest that we always 
return all fields. In the absent case, I'd return:

         return {
             'contents': None,
             'exists': False,
             'isexec': None,
             'issymlink': None,
         }

This way, users of the API that are "lazy" will get "basically true" 
information: a non-existent file has no content, is not executable, and 
is not a symlink. However, they won't have to gate every check on 
'exists', and if they pay attention to types, they will recognize that 
in the JSON, null !== false, and that also gives them additional 
information without checking the exists field.

> +
> +    output = flags(workingfilectx)
> +
> +    if origfile and util.filestat(origfile).stat:
> +        # Since you can start a merge with a dirty working copy (either via
> +        # `up -n` or `merge -f`), "local" must reflect that, not the underlying

update -n is a facebook thing. Outside of FB, hg update will happily put 
you into a merge without you needing to pass any flags.

> +        # commit. Those contents are available in the .orig version, so we look

Upstream, "changeset" is the word we use instead of "commit" (we (ab)use 
the translation system to change it from "changeset" to "commit" here at 
FB).

> +        # there and mock up the schema to look like the other contexts.
> +        local = {
> +            'contents': util.readfile(origfile),
> +            'exists': True,
> +            'isexec': util.isexec(origfile),
> +            'issymlink': util.statislink(util.filestat(origfile).stat),

It looks like this will cause two stats when we need only one. Instead, 
move the filestat call up and assign it to a variable, then use 
util.statisexec() for the exec bit.

> +        }
> +    else:
> +        # No backup file. This happens whenever the merge was esoteric enough
> +        # that we didn't launch a merge tool*, and instead prompted the user to
> +        # "use (c)hanged version, (d)elete, or leave (u)nresolved".
> +        #
> +        # The only way to exit that prompt with a conflict is to choose "u",
> +        # which leaves the local version in the working copy (with all its
> +        # pre-merge properties including any local changes), so we can reuse
> +        # that.
> +        #
> +        # Another alternative might be to use repo['.'][path] but that wouldn't
> +        # have any dirty pre-merge changes.
> +        #
> +        # *If we had, we'd've we would've overwritten the working copy, made a
> +        # backup and hit the above case.

Great comment!

> +        local = copy.copy(output)

Why do we need a copy here? Is it just to protect a caller from 
accidents like munging the returned dict and accidentally having it 
munge both output and local? Let's add a comment about why we're doing 
this copy.

> +
> +    output['path'] = repo.wjoin(workingfilectx.path())
> +
> +    return {
> +        'base': flags(basectx),
> +        'local': local,
> +        'other': flags(otherctx),
> +        'output': output,
> +        'path': workingfilectx.path(),
> +    }
> +
>   def _filemerge(premerge, repo, mynode, orig, fcd, fco, fca, labels=None):
>       """perform a 3-way merge in the working directory
>   
>