[PATCH 3 of 3 RFC] templater: add 'transplant_source' keyword

Sun Sep 19 15:56:24 CDT 2010

  Dan Villiom Podlaski Christiansen wrote, On 09/19/2010 09:08 PM:
> # HG changeset patch
> # User Dan Villiom Podlaski Christiansen<danchr at gmail.com>
> # Date 1283432861 -7200
> # Node ID 5697656e9aaba1374d3e1c7bb79dfa15c7e2cb70
> # Parent  2be04fa93c4d8804814df7143c4f9864f60eb6bf
> templater: add 'transplant_source' keyword.
> * * *
> templater: add 'convert_source' keyword.
>
> diff --git a/mercurial/help/templates.txt b/mercurial/help/templates.txt
> --- a/mercurial/help/templates.txt
> +++ b/mercurial/help/templates.txt
> @@ -66,6 +66,10 @@ keywords are usually available for templ
>
>   :latesttagdistance: Integer. Longest path to the latest tag.
>
> +:extra: Mapping. The extra metadata of the changeset. Use "." to get the
> +    individual items. For instance, {extra.branch} gets the branch name,
> +    but does not filter out default.
> +
>   The "date" keyword does not produce human-readable output. If you
>   want to use a date in your output, you can use a filter to process
>   it. Filters are functions which return a string based on the input
> diff --git a/mercurial/templatekw.py b/mercurial/templatekw.py
> --- a/mercurial/templatekw.py
> +++ b/mercurial/templatekw.py
> @@ -239,6 +239,44 @@ def showrev(repo, ctx, templ, **args):
>   def showtags(**args):
>       return showlist('tag', args['ctx'].tags(), **args)
>
> +def showextra(repo, ctx, templ, **args):
> +    def isbinary(s):
> +        '''
> +        Improved, UTF-8-aware heuristic for short strings.
> +
> +        The usual heuristic (does the string contain '\0'?) is insufficient for
> +        short strings. Instead, we use the following:
> +
> +        - Is the string valid UTF-8?

AFAIK people and extensions are free to put stuff in extra fields that 
are neither utf-8 nor "binary".

> +        - If so, does it contain any of the 32 ASCII control characters?
> +
> +        Combined, the two make false negatives reasonably unlikely.

Can you quantify this claim?

> +        '''
> +        # ' ', or space, is the first printable character in ASCII
> +        unprintable = set(unichr(c) for c in xrange(ord(' ')))
> +
> +        try:
> +            return not unprintable.isdisjoint(s.decode('utf-8'))
> +        except UnicodeDecodeError:
> +            return True
> +
> +    def unbin(s):
> +        '''
> +        ASCIIfy binary node identifiers, such as those stored by transplant.
> +
> +        Ideally, we wouldn't to do this here. On the one hand, we'd prefer to
> +        have users use a filter to get the hexadecimal revision identifier. On
> +        the other hand, binary data is - by definition - not text and therefore
> +        not printable. Requiring the user to always specify a filter is not
> +        a good UI; if possible we should handle that internally.
> +        '''
> +        if len(s) == 20 and isbinary(s):
> +            return hex(s)
> +        else:
> +            return s
> +
> +    return dict((k, unbin(v)) for k, v in ctx.extra().iteritems())

For the record: I don't like this heuristic. We don't know the type of 
these values and we can't guess them reliably, so we shouldn't try to.

I think it would be better to add a hex/short filters and use them 
explicitly in the templates when necessary.

Besides that:

convert_revision is stored hexified, while transplant_source is binary. 
Isn't the latter a bug that just should be fixed?

hg log -v --debug (and {extras}) will show the extra fields too. 
Whatever the solution is, isn't the same solution needed there?

/Mads