[PATCH 2 of 3] revset: transcode revsets to UTF-8
mg at lazybytes.net
Mon Nov 15 17:43:35 CST 2010
Matt Mackall <mpm at selenic.com> writes:
> On Fri, 2010-11-12 at 17:40 +0100, Dan Villiom Podlaski Christiansen
>> # HG changeset patch
>> # User Dan Villiom Podlaski Christiansen <danchr at gmail.com>
>> # Date 1289579971 -3600
>> # Node ID 0fa148bcfe7f0755236e4b9d0034c5cc7ac4771d
>> # Parent bdf95be4ea789a13d088f0955ffcd072590a1eb6
>> revset: transcode revsets to UTF-8.
>> This allows updating to a branch with non-ASCII names in non-UTF-8
> This doesn't quite mesh with our encoding philosophy,
Well, it works just the same for commit:
% echo >> a.txt && hg commit -m bøb
abort: decoding near 'bøb': 'ascii' codec can't decode byte 0xf8 in
position 1: ordinal not in range(128)!
% echo >> a.txt && LC_ALL=en_US.UTF-8 hg commit -m bøb
abort: decoding near 'bøb': 'utf8' codec can't decode byte 0xf8 in
position 1: invalid start byte!
% echo >> a.txt && LC_ALL=en_US.ISO8859-1 hg commit -m bøb
This has always seems quite right to me: we take the bytes given by the
user and decode them using his locale. If we cannot do this, then we
abort and give the user a chance to fix things.
> which can be summed up as "restrict encoding-aware code to the
> smallest set possible". If revset can't look up non-ASCII branch names
> in a Latin1 locale, then that means that branch lookup is broken, not
> that revsets needs to become encoding-aware.
> Related: how should lookup work for names that can't be represented in
> the local charset work? Answer: if hg branches shows "caf?" rather
> than "café", then I should be able to "hg up caf?".
That sounds bad to me -- the immediate question that arises is what to
do if there is a branch named 'caf?' with a "real" question mark?
I think the current behavior is fine: we make a best-effort when
converting the metadata into the user's local encoding, and we degrade
gracefully by letting Python substitute characters outside of the
encoding with '?'.
Mercurial links: http://mercurial.ch/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 197 bytes
Desc: not available
More information about the Mercurial-devel