Mercurial-devel Digest, Vol 2, Issue 25

Guenther Brunthaler spam_me_not_please_dont at gmx.nospam.net
Mon Nov 20 14:04:28 CST 2006


> From: Matt Mackall <mpm at selenic.com>
> Subject: Re: User metadata support

> I think it might be useful, but I also think it can't be done right.

I'm sorry to hear that!

Certainly it's not a feature to be implemented just over night.

But, perhaps, in some future version?

> ACLs are a perfect example of things Mercurial shouldn't care about.

I agree. That's why I think user-defined hooks should take care of such 
things, if users are actually in need for that.

But hooks need metadata to process on.

So, I'm *not* suggesting to include ACLs, symlinks or other things into 
Mercurial.

Instead, users should be provided by some means of setting and accessing 
metadata for each version-controlled object.

User-defined hooks can then implement that funtionality. All without 
Mercurial having to know about such applications by user hooks of the 
metadata streams it maintains.

> They're not portable from one user to another on the same box let
> alone one OS to another, there's no sane merge semantics, and they're
> arbitrarily complex.

That would actually not a problem for Mercurial if something like the 
suggested metadata stream feature was implemented: Only because the 
metadata is simply *there*, there is no obligation by a client to 
install a hook or make otherwise use of the metadata at all.

For instance, think of symlinks stored as metadata.

If the client is a UNIX box, it clearly has the capability of making use 
of "symlink" metadata streams.

So the user who checks out the repository can install a hook which 
creates or updates symlinks on checkout. Just the same way a user might 
install a hook for keyword expansion.

If the same repository is checked out on a Windows box, where there are 
no symlinks, the user will simply *not* install a hook which handles 
symlink metadata streams.

In that case, no symlinks will be created and nothing happens. But the 
metadata is still there and part of the repository; it's just dormant.

Or the Windows user will choose to install a hook for symlink metadata - 
but one which creates copies of the symlinked files on checkout, or 
resynchronized changes back in rsync manner on checkin.

It's completely up to the user: Mercurial need not care about it!

For Mercurial, metadata streams are just files.

Files to be version controlled.

The difference between the current way Mercurial operates and the 
suggested way is only that it then would have to hide the files at the 
leaf level of the file tree.

Because those files will be *interpreted* as streams.

But for Mercurial, they *are* no streams - they are just files to be 
kept under the control of a revlog.

So all the changes are needed at "high level" only, no changes to the 
underlying revlog or repository layout format are required.

Plus there is the bonus that it is then possible to version-control 
directories also at no additional cost.

> If you're going to need a hook to deal with them anyway, just check in
> a .acl file that contains the information you need and have the hook
> process it.

I have thought of that already.

But the problem is: The contents of that file needs to be synchronized 
on file renames or moves.

But even if it was implemented that way: Monotone works exactly that 
way, using its .mtattr file (as far as I can remember).

It did not work well.

Actually, the shortcomings of .mtattr was the main reason why I 
abandoned Monotone (aside from that awful LUA scripting language) and 
turned to SVK which can do all that right out of the box.

But there is also a more fundamental reason: The metadata streams for a 
file or directory share no relation among each other, so it feels a bit 
artifical to stuff them together into a single file.

Another aspect is versioning: If the metadata streams of different 
filesystem objects are kept in different revlogs, the can also be 
versioned independently of each other.

If all the metadata was stuffed into a single file, that file would 
change with every version when the metadata of any object in the file 
tree was changed.

If metadata is kept in independent revlogs, metadata is vesioned exactly 
the same way file contents are. (Especially as file contents are one 
specific kind of metadata stream.)

Another issue is performance: There is no reason to restrict the size of 
metadata streams in any way.

Metadata streams may contain short text strings as well as long binary 
data objects.

For Mercurial it's all just binary files; it won't care about the 
contents of the streams at all: It checks them in and out, and creates 
binary deltas from it. Like it is doing for normal files already.

So it might not be the best idea to merge them together into a single 
file anyway.

So, in order to do things right, it would currently be necessary to 
create two parallel subdirectory structures in parallel for each 
project: One subtree for the file data, and the other subtree for the 
remaining data streams for the hooks to act upon.

For instance, instead of my suggested layout

./hg/data/somedir/someotherdir/somefile/hg_data.d
./hg/data/somedir/someotherdir/somefile/readonly.d
./hg/data/somedir/someotherdir/somelink/hg_symlink.d

("readonly" here is an example of a user-provided metadata stream with 
no relevance to Mercurial itself) then the following structure could be 
used:

./hg/data/somedir/someotherdir/somefile.d
./hg/data/.metadata/somedir/someotherdir/somefile/readonly.d
./hg/data/.metadata/somedir/someotherdir/somelink/hg_symlink.d

which will check out a directory .metadata containing the metadata for 
the files in the main tree.

This would work, but the problems are:

* Error prone - if users rename a file or directory, they must rename 
the .metadata subdirectory as well, or things will get out of sync.

* Easy to lose the general view. In this model, metadata and the 
filesystem objects affected by it are completely de-coupled. There is no 
easy way to see which metadata streams are connected to which files or 
directories, especially in large projects.

Consider the example above:

Using the do-it-yourself approach, the manifest will look something like:

.metadata/somedir/someotherdir/somefile/readonly <hexstuff>
.metadata/somedir/someotherdir/somelink/hg_symlink <hexstuff>
somedir/someotherdir/somefile <hexstuff>

if Mercurial supported metadata streams directly, this would rather read 
something like:

somedir/someotherdir/somefile <hexstuff>
somedir/someotherdir/somefile [readonly] <hexstuff>
somedir/someotherdir/somelink [hg:symlink] <hexstuff>

In the first case, it's not easy to see that "somelink" is actually in 
the same directory as "somefile" is. In the second way it is.

But the most apparent reason why metadata support should be built in are 
the "hg mv" and "hg copy" operations.

For instance, a

$ hv mv somedir/someotherdir/somefile somedir/someotherdir/othername

would change the revlogs into:

./hg/data/somedir/someotherdir/othername/hg_data.d
./hg/data/somedir/someotherdir/othername/readonly.d
./hg/data/somedir/someotherdir/somelink/hg_symlink.d

well, ok, it's not a big change... but if we used the do-it-yourself 
method, more at different places are needed:

.metadata/somedir/someotherdir/othername/readonly <hexstuff>
.metadata/somedir/someotherdir/somelink/hg_symlink <hexstuff>
somedir/someotherdir/othername <hexstuff>

 > Mercurial stays simple, and your metadata gets handled
> precisely the way your project needs.

I also like the idea that Mercurial stays simple.

So why implementing symlinks, or special support for the executable bit?

Forget about it, and provide metadata support instead!

There is nothing more to be done, because then everything else can be 
done by user hooks, which are not part of Mercurial.

It's much like the keyword expansion feature: Not built into Mercurial, 
but available to clients through hook scripts.

So why not doing the same trick for symlinks? Or directory attributes? 
Or ACLs? Or Line-ending conversion? Or NTFS streams?

> This is also a frequently asked question. From the wiki (BinaryFiles):
> 
> - If you can't autodetect the file type, you will lose.

Actually I *did* read the FAQ before I posted.

But as I pointed out, there are cases when autodetection is simply not 
enough.

And, as metadata support would allow to implement all this via user 
hooks, the Mercurial developers need never care about that feature!

So, implementing metadata support can actually save you a lot of 
hassles, because issues like "line ending conversion" or "character set 
conversion" as well as "directory attributes" can all be solved using 
metadata streams and hooks: You will never have to talk about it again.

Think about keyword expansion: Problem solved, and it's not actually 
part of Mercurial!

Why not doing the same for symlinks support?

>> * Stream metadata. Machines like the Apple Macintosh can use different
>> streams in a file, the so-called "data fork" and "resource fork".
> 
> This is the biggest filesystem misfeature ever and even Apple had the
> good sense to deprecate them. Their primary purpose in Windows land is

I totally agree. It was just an example what could be done with metadata 
via user hooks.

Without implementing such support directly into Mercurial, that is.

If users *want* it, they can implement it themselves using metadata and 
hooks.

> to introduce security holes. Now I'm going to have nightmares about

Yes. It's a braindamaged feature.

...but some users like it.

> Mercurial invisibly checking in trojans hiding in text files, thanks.

Not Mercurial will do anything witch such metadata (other than 
version-controlling it).

It's the user hooks which will run the viruses if they are braindamaged 
enough to do it.

Whatever happens in the user's hooks, it's not the responsibility of 
Mercurial any more.

And, by the way: NTFS streams can store any data, including viruses. But 
that does not *run* the viruses.

It's exactly the same as a normal file: It could also contain a virus, 
and be version-controlled by Mercurial.

So what's the matter.

Greetings,
Guenther


More information about the Mercurial-devel mailing list