[Bug 5771] New: Mercurial patch parser behavior for "--git" patches creating files with spaces differs from Git when the patch file omits trailing tabs

mercurial-bugs at mercurial-scm.org mercurial-bugs at mercurial-scm.org
Tue Jan 16 17:18:14 UTC 2018


https://bz.mercurial-scm.org/show_bug.cgi?id=5771

            Bug ID: 5771
           Summary: Mercurial patch parser behavior for "--git" patches
                    creating files with spaces differs from Git when the
                    patch file omits trailing tabs
           Product: Mercurial
           Version: default branch
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: bug
          Priority: wish
         Component: Mercurial
          Assignee: bugzilla at mercurial-scm.org
          Reporter: selenic at yghe.net
                CC: mercurial-devel at mercurial-scm.org

This is a tangled mess that dates back to at least 1991 and not really a bug in
Mercurial, but I figured I'd discuss it upstream for completeness in case
someone else runs into it eventually or it's otherwise helpful.

Briefly: `hg import` can parse and apply Git patches, but the behavior of `hg
import` for certain inputs (patch files which are invalid, but look reasonable)
differs from the behavior of `git apply` for the same patch files.

When a change creates a file with a space in it, like `X Y.txt`, `git` emits a
patch with a trailing tab literal:

```
diff --git ...
...
+++ b/X Y.txt\t
...
```

All of `git apply`, `hg import`, and `patch` apply this patch correctly.

However, if the semantic, trailing tab literal is removed, `git apply` parses
the file path `X Y.txt` while Mercurial parses the file path `X`.

`patch` agrees with Mercurial, but this is a `diff --git`, so arguably
Mercurial should do what Git does instead. In this case, Git's behavior is also
what the user is likely to expect.

Since 2006 (see
<https://github.com/git/git/commit/1a9eb3b9d50367bee8fe85022684d812816fe531>)
there is no way to generate these patches directly with Git. They can only be
generated by damaging a patch file in a way that strips trailing tabs (e.g.,
maybe emailing it and then copy/pasting it) or by a third-party tool generating
synthetic patches and missing this implementation detail (as here; the need to
apply a trailing tab literal to files with spaces in the names evaded me when
implementing patch generation behavior in Phabricator).

Mercurial uses this logic to parse the filename (in `patch.py`):

```
def parsefilename(str):
    # --- filename \t|space stuff
    s = str[4:].rstrip('\r\n')
    i = s.find('\t')
    if i < 0:
        i = s.find(' ')
        if i < 0:
            return s
    return s[:i]
```

Simply removing the test for spaces would make the behavior match Git's
behavior in this specific, narrow case. However, it might change behavior in
other cases, and I wasn't able to dig up the provenance of this logic (it
vanishes into Chris Mason's `mpatch` library in 2007, and I can't find a
repository for that library since the Oracle repository no longer seems to
exist). So I don't know if this change is safe or reasonable. Since it requires
a herculean effort to even encounter this behavior, the change may not be
worthwhile.

This logic also executes for both `--git` patches and unified patches, and the
implementation might reasonably prefer to match the behavior of `patch` for
patches not marked as `--git`.

Realistically, you should probably just close this issue and forget about it.

See <https://secure.phabricator.com/T8768> for additional context about how we
encountered and resolved this and what Git and `diff` do and some guesses at
why.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list