Diff for "BinaryFiles"

Differences between revisions 3 and 4

Why tracking file types is a bad idea

'binary files' are an ill-defined concept
- Text files from other locales can easily be confused with binary files. Text files are executable on some systems. Etc.
If you can't autodetect the file type, you will lose
- Users are lazy and special cases are infrequent. This means that any scheme that relies on users manually marking special file types will fail. Users will consistently forget to mark special files in the rare case where it is needed. This means that most of the time, special files will be handled incorrectly! For instance, users will almost always forget to mark binary files on commit, only to discover that it blows up at the next merge when it's too late. Worse, we've now got immutable history that's permanently incorrect.
If you can autodetect the file type, you don't need to track it
- You just need to adapt your process to detect the types of files you care about. For instance, modify the sample hgmerge script to detect your special files.

What mercurial does with binary files

Mercurial generally makes no assumptions about file contents. Thus, most things in Mercurial work fine with any type of file.

The exceptions are commands like diff, export, and annotate, that work well on files intended to be read by humans, and merge, where processing binary files makes very little sense at all.

The question naturally arises, what is a binary file anyway? It turns out there's really no good answer to this question, so Mercurial uses the same heuristic that programs like diff(1) use. The test is simply if there are any NUL characters in the first 1K or so of a file.

For diff, export, and annotate, this will get things right almost all of the time and it will not attempt to process files it thinks are binary. If necessary, you can force these commands to treat files as text with -a.

Merging is another matter. The actual merging of individual files in Mercurial is handled entirely by external programs and Mercurial doesn't pretend to tell these programs what files they can and cannot merge.

The example merge script hgmerge currently makes no attempt to do anything special for various file types, but it could easily be extended to do so. But precisely what you would want to do with these files will depend on the specific file type and your project needs.

-  ⇤ ← Revision 3 as of 2005-09-15 20:22:36 → 
  Size: 1314
  Editor: mpm
  Comment:
+   ← Revision 4 as of 2006-11-08 22:08:07 → ⇥
  Size: 2444
  Editor: 10
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
+== Why tracking file types is a bad idea ==

 * 'binary files' are an ill-defined concept

   Text files from other locales can easily be confused with binary files. Text files are executable on some systems. Etc.

 * If you can't autodetect the file type, you will lose

   Users are lazy and special cases are infrequent. 
   This means that any scheme that relies on users manually marking special file types will fail.
   Users will consistently forget to mark special files in the rare case where it is needed. 

   This means that most of the time, special files will be handled incorrectly!
   For instance, users will almost always forget to mark binary files on commit, 
   only to discover that it blows up at the next merge when it's too late.
   Worse, we've now got immutable history that's permanently incorrect.

 * If you can autodetect the file type, you don't need to track it

   You just need to adapt your process to detect the types of files you care about. For instance, modify the sample
   hgmerge script to detect your special files.

== What mercurial does with binary files ==