[PATCH 0 of 1] Interest for convert option --contentsfilter?

Jason Harris jason at jasonfharris.com
Mon Apr 18 08:49:35 CDT 2011


On Apr 18, 2011, at 3:09 PM, Martin Geisler wrote:

> Jason Harris <jason at jasonfharris.com> writes:
> 
>> On Apr 18, 2011, at 1:35 PM, Martin Geisler wrote:
>> 
>>> Jason Harris <jason at jasonfharris.com> writes:
>>> 
>>> The encode/decode filters get the original file content on stdin and
>>> have to produce the new file content on stdout. Such a filter can do
>>> the above subsitutions just fine. The only part missing is
>>> information about the current filename and the current changeset hash
>>> -- we could provide them to the filter via environment variables in
>>> the same fashion as how we provide such information to hooks.
>>> 
>>> I don't really like the encode/decode filters that much, but now that
>>> we have them, then using them might make sense.
>> 
>> Sure technically you are totally correct, but does the above
>> replacement seem like an encode, or a decode operation to you?
> 
> You 'encode' files when you store them in the repository and 'decode'
> them when you write to the working copy. I guess it make some sense to
> apply the decode filter when you read data from the original repository
> and apply the encode filter when you write data to the new repository.
> 
> You can specify your filter as either an encode or decode filter, both
> will work assuming that they are both applied... So maybe it's stupid to
> overload this mechanism here after all.
> 
> A better option would be to let the user specify a file with patterns
> and corresponding filters:
> 
>  hg convert --filters = myfilters.txt
> 
> where myfilters.txt contains
> 
>  **.c = sed 's|foo|bar'
> 
> or something like that.

I don't think so. That approach is still thinking from the mind set of an
encoding / decoding. It just doesn't allow surgical enough conversions. Eg What
happens if I want to do the conversion just within a specific directory:
r".*myrepo/libs/screendrawing\.(h|c|m|cpp|hpp)$" etc and I don't want it to be
done everywhere? Or I want different conversions for different directories?
Show-horning everything through a **.c filter won't allow the user to make these
kinds of changes. And it's highly likely they will want to be able to do such
things. Much better would be to pass the information to the script (either
through command line arguments or env variables) and then the script can do
whatever it wants in a fully general way.

> 
>> Of course changing line endings, or tab conversion seems like an
>> encoding / decoding operation. But the above search are replace
>> operations are not semantically a encode / decode operation in my
>> mind... It would just seem a little odd for a sequence of replacements
>> like:
>> 
>>    inspectorBWSplitView -> inspectorSplitView
>>    reasonForInvalIdityOfSelectedEntries -> reasonForInvalidityOfSelectedEntries
>> 
>> to be thought of as an encoding...
> 
> Yeah, it's of course a strange "encoding". It's really a filter and that
> is what the encode/decode setting allow you to configure, nothing more.

This together with the reason above, make me feel that thinking about this as if it's
encode / decode stuff just feels like the wrong way to do things...

>> Yep. The one neat thing is git-filter-branch can run a filter over the
>> entire tree at the same time [...]
> 
> I'm not sure where you read that? I cannot find anything about this in
> the 'git help filter-branch' text.

It's the --tree-filter option to git-filter-branch:

     This is the filter for rewriting the tree and its contents. The argument is
     evaluated in shell with the working directory set to the root of the
     checked out tree. The new tree is then used as-is (new files are
     auto-added, disappeared files are auto-removed - neither .gitignore files
     nor any other ignore rules HAVE ANY EFFECT!).

>> In any case. I can make the minor changes to make this work with:
>> 
>>>> contentsfilter -> filter
>>>> using stdin and stdout instead of passing the data via the command line.
>> 
>> Or I can happily leave it to others to hook this up through encode /
>> decode if anyone is interested in doing this?
> 
> I don't think anybody will do it for you -- otherwise it would have been
> done a long time ago :)


So to sum up. First it seems there is interest in such a patch.

Second, thus if I just make the patch with:

  (i) contentsfilter -> filter
  (ii) using stdin instead of passing the data via the command line.
  (iii) use environment variables to pass originalFileName and originalHash
        to the script

will such a patch get pulled to crew? (Personally, I have solved the problem
for my immediate needs and further polishing if its not going to go in is of
course wasted.)

If someone else in crew instead wants to go with an alternative method and
wants to write it, then of course please go ahead...

Cheers,
  Jas


More information about the Mercurial-devel mailing list