Fixing patch import with win32text, issue 1019

Wed Jun 3 15:14:05 CDT 2009

Colin Caughie a écrit :
> Hi,
> 
> Following an IRC chat about this last night with mpm and pmezard I'd like to gather my thoughts on this, hopefully solicit some feedback and ideas and try to move this forward.
> 
> As pmezard pointed out, there are two issues, or perhaps one issue and one question:
> 
> 1. The issue is that "hg import" and extensions such as mq simply don't work with win32text filtering enabled; patches consistently fail to apply.
> 2. The question is: If this is fixed, what should "hg import" do about line endings, a) if the patch file contains just LFs, b) if it contains CRLFs or mixed endings?
> 
> Having thought a bit about this I think the only reasonable answer to 2. is:
> 
> Importing a patch in a repo with win32text should have the same effect as importing the same patch in a repo *without* win32text, then pulling to one with win32text.
> 
> (Or at least as near to the same effect as possible; the win32text filter is not lossless, so using it with files with mixed line endings is always going to lose information whether you're importing or not.)
> 
> Moreover I would expect the above to be true for any filter, even a daft one like an EBCDIC converter or something.
> 
> The only way I can see of making this work is to pass the repo object, or something with access to it, to patch.patch() and then on to the patchfile object, so that it can use the filters to encode the file before patching, then decode the result back to the working dir. This is what my patch does (latest version submitted 2009-05-23).
> 
> I can understand the reluctance to add new parameters to key functions, but from an architecture point of view it doesn't seem to go against the grain; the repo object is already passed to merge.update() et al and used in much the same way that I'm proposing.
> 
> Having said that if anyone has any suggestions on how to make this happen with less of an impact to the code, I'd be glad to hear them.
> 
> Colin

I will switch the discussion from the filter perspective to the EOL one.

The only reason we care about line endings with patches is sometimes we want to record line-endings changes. I cannot see any other relevant use case. So can we do the right thing automatically? There is no need to consider git extended operations like file deletion, copy, renaming, mode change, symlinks. Remain file creation and edition, represented by diff hunks.

For every patched file:

if "all hunks are deletion hunks":
    # We really don't care about line endings when deleting stuff
    # or removing files
    ignore EOLs and patch
else if "hunks do not have mixed EOLs":
    # Looks like EOLs are not important for this change.
    if "patched file contains at least one EOL":
        if "patched file has a well defined EOL":
            # Be tolerant
            ignore EOL, patch, normalize toward patched file EOL.
        else:
            # This is a mess, a warning might be good.
            patch as usual
    else:
        # This is a file addition, or we are editing a file with a
        # single line without EOL.
	by default, patch as usual, but give an option to create file 
        in a specific EOL by default (possibly set by win32ext).
else:
    # Patch with mixed EOLs, chances are they are meaningful.
    patch as usual

I don't know how hard it would be to implement but it sounds like what people would expect.

--
Patrick Mézard