Proposal for simple metadata implementation

Benjamin Pollack benjamin at bitquabit.com
Wed Mar 4 11:49:51 CST 2009


On 3/4/09 10:29 AM, Jesper Nøhr wrote:
> The extension would add 2 new commands to hg, something like 'setprop' 
> and 'getprop'. setprop would set a property on a path, and getprop 
> would print the current value of the properties set on a path. The 
> (key, value) pairs themselves would be stored in a simple plaintext 
> file, e.g. '$(root)/.hgmetadata'. The (key, values) are stored 
> together with a path name, so a simple (unoptimized) format could look 
> like this (.ini based):
>
> [/README]
> property_one = foo
> property_two = bar
> property_one = something else

This implementation has some serious issues.  There is no way to 
distinguish between different files that occupy the same path at 
different points in history, and there is no way to determine which 
property was added in which order.  For the common cases (MIME types and 
text encodings), we simply want a single value for each property 
associated with the file, which should almost always be the last one 
added.  We have no way of determining the order properties were set 
without walking the file history manually.

Both problems are trivial to solve just by recording the 4-tuple 
(revision, filename, property, value).  You can then easily order the 
properties by date or any other metric, and can distinguish different 
files at the same path by noting that the file did not exist at the 
specified path at the given revision.

I'm not convinced, though, that the above tweak doesn't just 
sidestepping a fairly major design flaw.  The ArbitraryMetadata spec, 
and your proposal, both say that .hgmetadata holds a set of values for 
any given key, not simply an individual value; the interpretation of 
that set is left to the tool trying to use the data.  Yet I cannot come 
up with any scenario where retrieving the full set, even with my 
suggested schema change, provides useful information.  If I truly want 
to store a set, I would have to store it as a single key/value, since 
there is no way to delete individual propsets.  This means I cannot 
actually use the set semantics to keep track of, say, bugs that a given 
test file should be checking, or tests that a given file should pass.  
I'd have to store that as a single entry.

Yet responding to that criticism takes me back to simply having a 
manifest-style .hgmetadata file that is the canonical list of all 
properties applicable to all files at a given revision.  In so doing, 
you lose the trivial merge semantics that make this design appealing in 
the first place.

Although I would love to have a standard way to keep track of metadata 
in a Mercurial repository, I'm not convinced that this solution scales.  
I think perhaps attacking this on a more specific case-by-case basis for 
metadata problems we already have--such as text file encodings--might be 
wiser.

--Benjamin


More information about the Mercurial-devel mailing list