Fixing patch import with win32text, issue 1019

Patrick Mézard pmezard at gmail.com
Wed Jun 3 17:01:55 CDT 2009


Patrick Mézard a écrit :
> For every patched file:
> 
> if "all hunks are deletion hunks":
>     # We really don't care about line endings when deleting stuff
>     # or removing files
>     ignore EOLs and patch
> else if "hunks do not have mixed EOLs":
>     # Looks like EOLs are not important for this change.
>     if "patched file contains at least one EOL":
>         if "patched file has a well defined EOL":
>             # Be tolerant
>             ignore EOL, patch, normalize toward patched file EOL.
>         else:
>             # This is a mess, a warning might be good.
>             patch as usual
>     else:
>         # This is a file addition, or we are editing a file with a
>         # single line without EOL.
> 	by default, patch as usual, but give an option to create file 
>         in a specific EOL by default (possibly set by win32ext).
> else:
>     # Patch with mixed EOLs, chances are they are meaningful.
>     patch as usual

The problem with this method is the patch code works on an enumeration of hunks. It does not know whether all hunks are deletion ones or if all of them have the same EOLs and so forth. But it does not matter much, here is another version:

for every patched file:
for every hunk:

if "all old lines and all new lines have the same EOLs":
    if "patched file has well defined EOLs":
        normalize new lines to patched file EOL and patch
    else:
        if "patched file has at least two EOLs":
            # This is a mess
            if "there no new lines":
                # Deletion hunk, ignore EOLs
                ignore EOLs and patch
            else:
                warn
                patch as usual
        else:
            # This is a file addition, or we are editing a file with a
            # single line without EOL.
            if "win32ext and co":
                normalize new lines to selected EOL and patch
            else:
                patch as usual
else:
    # EOLs probably matter
    patch as usual


The differences are:
- Deletion hunks are now clearly a special case where we allow patching to proceed whether EOLs match or not.
- Files can be patched with some hunks being normalized and others not, but only when either the original file or the patched file would end with mixed EOLs. Worse case being something like: file is LF, hunk1 has oldlines and newlines in CRLF, hunk2 has oldlines in LF and newlines in CRLF. The heuristic would convert hunk1 to LF and patch, and apply hunk2 as such. Without the heuristic, patching would have rejected hunk1. I don't know how relevant this is. We can always warn when normalizing.

To implement it we must be able to:
1- Tell if patched file "has a well defined EOL": we already load it in memory, can be done.
2- Tell if patched file has at least two EOLs: same as [1]
3- Tell if old lines and new lines have same EOLs: hunk objects contain both sets in memory, can be done.
4- Ignore EOLs and patch:
  a- Fix patchfile.findlines(): for example in self.hashlines we can hash by line without EOL, and store the EOL in the hash. A successful match would depend on the presence of the line and sometimes of the correct EOL. Consumes more memory but could work.
  b- Fix diffhelpers.testhunk(): add an argument to ignore EOLs, and return a special code when it does so we can issue warnings.
  c- EOL normalization of new lines is trivial.

In the end, it does not look that horrible, even if I expect a lot of ugly corner cases to surface when we start hacking it.

--
Patrick Mézard


More information about the Mercurial-devel mailing list