encode/decode filter hooks (was: Fun stuff in tip)

Thu Sep 15 07:55:20 CDT 2005

Matt Mackall wrote:
> I've also finished up the file filtering code. This allows you to
> specify arbitrary file filtering for checkin/checkout in hgrc, eg:
> 
> [encode]
> *.gz = gunzip
> 
> [decode]
> *.gz = gzip
> 
> This can also be used to handle line ending issues via
> dos2unix/unix2dos and expansion of variables ala CVS.

Sounds cool, but will need some good documentation.

If I understand your example above, any .gz files in the project would 
be decompressed before being handled by hg, which would improve the 
chances of delta storage saving space.

Fortunately, gzip is smart enough not to gzip or ungzip something twice. 
Apparently dos2unix/unix2dos is as well. Not all tools will be, and 
those tools pose a danger, especially when a new rule is added to an 
existing repo. As long as the in-repo files are already in the encoded 
format, it's ok. But if they are currently stored in the decoded format, 
and the decoder can't detect that, they will end up getting double-decoded.

Is there any mechanism to do a one-time conversion to apply a new 
[encode] rule to all the files in an existing repo?

Also, is it true that these rules can only be applied via filename 
pattern matching? (As opposed to being detected by the 'type' or 'file' 
command.) If so, then things like README will have to be filtered by its 
specific name, which is ok. I assume these patterns can contain 
directory names, which will help.

Kevin