{i} This page does not meet our wiki style guidelines. Please help improve this page by cleaning up its formatting.

/!\ This page is primarily intended for Mercurial's developers.

(Please see the current BinaryFiles page for why tracking 'file types' is always a bad idea -mpm)

As mpm has pointed out, the notion of BinaryFiles is problematic. A lot of tools use the "search for NUL" heuristic that Mercurial uses, but there are files for which this does not work and it doesn't really help us select the right merge tool.

As pointers, here are two earlier discussions related to this topic, both concerning handling of character sets:

Some definitions:

The current .hgrc mechanism provides a means to assign an encode/decode tool on the basis of file name matching. This is a fine way to specify default behavior for a large number of files, but we found in OpenCM that there were always exceptions. A particularly unpleasant exceptional case is that XML content written by programs is binary content while XML content written by humans is text content.

Our plan in OpenCM was to record an (optionally specified) notion of type for each file in the manifest. A type is simply a unique name -- it has no intrinsic semantics. Other tools, including the encode/decode tools and the merge tools, can find out what the type name is and use that information to decide how to process the file. In the absence of an explicitly specified type, heuristics similar to the ones currently used in .hgrc can be applied, with the 'check for NUL' test providing an ultimate fallback to binary.

In the context of Mercurial, this would imply some minor changes. I think they can be done backwards compatibly:

So here is a specific proposal to serve as a starting point of discussion:

  1. There should be a means to specify a per-file typename.
  2. In the absence of a specified type, heuristics in .hgrc are applied to determine the type.
  3. If that does not resolve type, the current NUL check is used, resulting in one of the types binary or text.

  4. Type names should be passed to hgmerge.
  5. Selection of encode/decode strategy should be based on the type name, not the GLOB match.

NewIdeas/FileTypes (last edited 2012-05-13 09:59:52 by 62)