Proposal for simple metadata implementation
Benjamin Pollack
benjamin at bitquabit.com
Wed Mar 4 11:49:51 CST 2009
On 3/4/09 10:29 AM, Jesper Nøhr wrote:
> The extension would add 2 new commands to hg, something like 'setprop'
> and 'getprop'. setprop would set a property on a path, and getprop
> would print the current value of the properties set on a path. The
> (key, value) pairs themselves would be stored in a simple plaintext
> file, e.g. '$(root)/.hgmetadata'. The (key, values) are stored
> together with a path name, so a simple (unoptimized) format could look
> like this (.ini based):
>
> [/README]
> property_one = foo
> property_two = bar
> property_one = something else
This implementation has some serious issues. There is no way to
distinguish between different files that occupy the same path at
different points in history, and there is no way to determine which
property was added in which order. For the common cases (MIME types and
text encodings), we simply want a single value for each property
associated with the file, which should almost always be the last one
added. We have no way of determining the order properties were set
without walking the file history manually.
Both problems are trivial to solve just by recording the 4-tuple
(revision, filename, property, value). You can then easily order the
properties by date or any other metric, and can distinguish different
files at the same path by noting that the file did not exist at the
specified path at the given revision.
I'm not convinced, though, that the above tweak doesn't just
sidestepping a fairly major design flaw. The ArbitraryMetadata spec,
and your proposal, both say that .hgmetadata holds a set of values for
any given key, not simply an individual value; the interpretation of
that set is left to the tool trying to use the data. Yet I cannot come
up with any scenario where retrieving the full set, even with my
suggested schema change, provides useful information. If I truly want
to store a set, I would have to store it as a single key/value, since
there is no way to delete individual propsets. This means I cannot
actually use the set semantics to keep track of, say, bugs that a given
test file should be checking, or tests that a given file should pass.
I'd have to store that as a single entry.
Yet responding to that criticism takes me back to simply having a
manifest-style .hgmetadata file that is the canonical list of all
properties applicable to all files at a given revision. In so doing,
you lose the trivial merge semantics that make this design appealing in
the first place.
Although I would love to have a standard way to keep track of metadata
in a Mercurial repository, I'm not convinced that this solution scales.
I think perhaps attacking this on a more specific case-by-case basis for
metadata problems we already have--such as text file encodings--might be
wiser.
--Benjamin
More information about the Mercurial-devel
mailing list