User metadata support

Matt Mackall mpm at selenic.com
Mon Nov 20 09:56:10 CST 2006


On Sun, Nov 19, 2006 at 08:54:51PM +0100, Guenther Brunthaler wrote:
[metadata]
> You might think nobody actually needs such a thing?

I think it might be useful, but I also think it can't be done right.

> Let's illustrate a few cases where such metadata would be highly useful:
> 
> * Additional permission bits. Currently, Mercurial supports the
> executable bit right out of the box. Fine. But what if more permission
> bits should be associated with a file, such as the sticky bit. Or
> creating a file as read-only. Or a special POSIX ACL. If hooks for
> checkin/checkout had access to file metadata, the hooks could set the 
> appropriate bits on checkout as required, and without a need to 
> integrate such features into the Mercurial core.

ACLs are a perfect example of things Mercurial shouldn't care about.
They're not portable from one user to another on the same box let
alone one OS to another, there's no sane merge semantics, and they're
arbitrarily complex.

If you're going to need a hook to deal with them anyway, just check in
a .acl file that contains the information you need and have the hook
process it. Mercurial stays simple, and your metadata gets handled
precisely the way your project needs.
 
> * Line-ending conversion. While I agree that line-ending conversions
> should normally be performed based on heuristics because users tend to 
> forget about setting special properties, there are exceptions. What if a 
> file with extension .txt is a texture in some project subdirectory 
> rather than a text file like in the rest of the project? If the 
> autodetection heuristics for binary files fails, we'll be screwed as 
> soon as line-ending conversion will be attempted on that file. Using a 
> property such as "hg:eol-style" set to "binary" would let a hook script 
> override autodetection in such cases.

This is also a frequently asked question. From the wiki (BinaryFiles):

- If you can't autodetect the file type, you will lose.

  Users are lazy and special cases are infrequent. This means that any
  scheme that relies on users manually marking special file types will
  fail. Users will consistently forget to mark special files in the
  rare case where it is needed.

  This means that more often than not, special files will be handled
  incorrectly! For instance, users will almost always forget to mark
  binary files on commit, only to discover that it blows up at the
  next merge when it's too late. Worse, we've now got immutable
  history that's permanently incorrect.

- If you can autodetect the file type, you don't need to track it.

  You just need to adapt your process to detect the types of files you
  care about. For instance, modify the sample hgmerge script to detect
  your special files.

> * Character set conversion. What if a single directory contains text
> files in different character set encodings? Just think of text files on
> a Windows machine which shall also be edited on a UTF-8 Linux 
> workstation: On the Windows side, most files will be using the "ANSI" 
> character set (in fact WINDOWS-1252 because of that EURO-Symbol), but 
> some files intended to be used by the Console are instead represented 
> using the "OEM" character set (IBM CP 437 or CP 850). Using the same 
> conversion for all text files cannot work in this case. And they all 
> share the same filename extension. It is necessary to override the 
> conversions on a per-file basis. Metadata properties would allow the 
> hook to also take care of this.

See above.

> * Stream metadata. Machines like the Apple Macintosh can use different
> streams in a file, the so-called "data fork" and "resource fork".

This is the biggest filesystem misfeature ever and even Apple had the
good sense to deprecate them. Their primary purpose in Windows land is
to introduce security holes. Now I'm going to have nightmares about
Mercurial invisibly checking in trojans hiding in text files, thanks.

> But I would really prefer Mercurial - if it only could support support 
> properties like symlinks and character conversion attributes.

Symlink support will happen.

--
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial-devel mailing list