[PATCH 0 of 1] Interest for convert option --contentsfilter?

Mon Apr 18 06:58:55 CDT 2011

On Apr 18, 2011, at 1:35 PM, Martin Geisler wrote:

> Jason Harris <jason at jasonfharris.com> writes:
> 
>> On Apr 18, 2011, at 12:00 PM, Martin Geisler wrote:
>> 
>>> The filtering on filenames makes me think of our encode/decode
>>> filters... could we instead use that API and infrastructure? So
>>> instead of a new option to convert we let convert read/write data
>>> through the normal encode/decode filters. That already allows for
>>> calling both shell scripts and Python functions.
>> 
>> Maybe, but the point is that it's just one type of filtering.
>> 
>> I might have another filter which is
>> 
>> newContents = re.sub(r"(\W)BrowserView(\W)", r"\1FileView\2", originalContents)
>> newContents = re.sub(r"(\W)inspectorBWSplitView(\W)", r"\1inspectorSplitView\2", newContents)
>> newContents = re.sub(r"(\W)reasonForInvalIdityOfSelectedEntries(\W)", r"\reasonForInvalidityOfSelectedEntries\2", newContents)
>> 
>> or something like that.. ie a series of global replacements on the
>> contents to correct annoying mistakes.
>> 
>> Or I could restrict such changes to just certain functions since the
>> hash is passed in. In fact although its not passed in now one could
>> imagine passing in as well the rev number and only doing a
>> transformation for a certain range of changesets.
>> 
>> Thus I don't see this is exactly the same as a normal encode / decode
>> filter. I guess though one common use for it will be to fix eol,
>> everywhere and consistently throughout the repo.
> 
> The encode/decode filters get the original file content on stdin and
> have to produce the new file content on stdout. Such a filter can do the
> above subsitutions just fine. The only part missing is information about
> the current filename and the current changeset hash -- we could provide
> them to the filter via environment variables in the same fashion as how
> we provide such information to hooks.
> 
> I don't really like the encode/decode filters that much, but now that we
> have them, then using them might make sense.

Sure technically you are totally correct, but does the above replacement seem
like an encode, or a decode operation to you? Of course changing line endings,
or tab conversion seems like an encoding / decoding operation. But the above
search are replace operations are not semantically a encode / decode operation
in my mind... It would just seem a little odd for a sequence of replacements
like:

   inspectorBWSplitView -> inspectorSplitView
   reasonForInvalIdityOfSelectedEntries -> reasonForInvalidityOfSelectedEntries

to be thought of as an encoding...

>> Thus would this be accepted / included with name change:
>> 
>> contentsfilter -> filter
>> using stdin and stdout instead of passing the data via the command line.
>> 
>> I didn't mention it but some people might already know that git has
>> such an option git-filter-branch, and this allows Mercurial to do the
>> same thing...
> 
> Yeah, I would like to see Mercurial provide the same filtering
> capabilities in a clean way.

Yep. The one neat thing is git-filter-branch can run a filter over the entire
tree at the same time, so if there are 1000 files in the checkout they only need
one external system call, and thus the conversion can be potentially much faster
rather than have the bash / python / sed startup time incurred each time. Still
how often are such whole tree conversions going to be done. So if it takes 5
times longer its not that much of a problem.

In any case. I can make the minor changes to make this work with:

>> contentsfilter -> filter
>> using stdin and stdout instead of passing the data via the command line.

Or I can happily leave it to others to hook this up through encode / decode if
anyone is interested in doing this?

Cheers,
  Jas