Dealing with binary files (was Re: [PATCH]Make hg diff go nice on binary files)

Matt Mackall mpm at
Fri Aug 26 15:26:43 CDT 2005

On Wed, Jul 27, 2005 at 11:54:02PM -0400, Theodore Ts'o wrote:
> On Wed, Jul 27, 2005 at 05:29:02PM -0700, Matt Mackall wrote:
> > On Wed, Jul 27, 2005 at 07:52:32PM -0400, Kevin Smith wrote:
> > > Matt Mackall wrote:
> > > >Looked at another way, there are exactly three things we'll use a
> > > >binary flag for:
> > > >
> > > >- deciding whether we can diff/export/annotate
> > > >- deciding whether to merge
> > > >- deciding how to display something in hgweb
> > > 
> > > Probably true. However, a fourth item is at least related: newline 
> > > mangling. Either we say that text == mangled, OR we need another very 
> > > similar flag to allow users to indicate which files should be mangled 
> > > and which should not.
> > 
> > I'm pretty strongly against building any file mangling into Mercurial,
> > either for locale conversion or newline conversion.
> > 
> > I'm willing to provide a hook for commit-time and checkout-time
> > filtering, but then the user is responsible for all the pain thus
> > incurred.
> Even if we do this in a separate wrapper program, that program needs
> some way to know whether or not to do newline mangling --- there may
> be files where even though it is a text file, the user may not want to
> do eoln mangling on that particular text file (for one reason or
> another).  
> Also, as we discussed at the Kernel Summit BOF session, there may be
> other more general uses where it may be useful to be able to specify a
> specialized pipeline of canonicalization and decanoncalization filters
> on a per-file basis.  For example, if a particular file is an
> openoffice file or some other file which is a compressed XML stream,
> decompressing the XML stream before checking in the file will allow
> for more efficient diffs to be stored in the SCM.  Then when the file
> is checked out into the working directory, the XML stream can be
> recompressed so that openoffice can work with the file.
> Another use that would be nice to implement as a filter would be a way
> of expanding RCS and/or SCCS keywords when the file is checked out,
> but to make them go away before doing the checkin so as not to
> contaminate the diffs.  BitKeeper had this functionality, and I miss
> it.  But I can understand not wanting to contaminate the core
> mercurial functionality with this feature, so doing it as some kind of
> plugin or hook makes perfect sense.
> So it would be nice if there was a way of associated a pipeline of
> filters on a per-file basis.  From a user convenience point of view it
> would be really cool if it could be stored in the file's metadata, and
> could be changed as part of a changeset operation.  (Perhaps "hg
> admin" to set and get the pipeline of filters for each file.)
> If this is too much complexity, a slightly more kludgy, but simpler to
> implement method would be to create a file which associates regular
> expressions with a set of filter pipelines.  
> In any case I think that implementing the capability of associating a
> pipeline of filters on a per-file basis is a much cooler way of
> implementing functionality which in BitKeeper is implemented as a
> series of special-case hacks (for eoln termination, RCS keyword
> expansion, SCCS keyword expansion, etc.)  The fact that it also allows
> us to do something Bitkeeper can't --- a more efficient way of dealing
> with compressed XML files from OpenOffice --- also makes this approach
> very appealing, and IMHO suggests that it may be the Right Way to add
> a lots of functionality while keeping the core SCM as simple as
> possible.

Update on this:

I've set things up so that all working directory file I/O now goes
through localrepository.wread and .wwrite. So all that remains is to
add the filtering hooks to these functions.

Mathematics is the supreme nostalgia of our time.

More information about the Mercurial mailing list