[PATCH 4 of 4] changelog: lazy decode user (API)

Yuya Nishihara yuya at tcha.org
Wed Mar 2 09:55:13 EST 2016


On Tue, 01 Mar 2016 13:22:20 -0600, Matt Mackall wrote:
> On Sat, 2016-02-27 at 23:27 -0800, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1456641258 28800
> > #      Sat Feb 27 22:34:18 2016 -0800
> > # Node ID ee98b780730118e8a8948396507633a0460c154e
> > # Parent  8427442ba08dd8dc324ea9e1fd30f65c89b2b753
> > changelog: lazy decode user (API)
> > 
> > This appears to show a similar speedup as the previous patch.
> 
> These two scare me (and are against our encoding conventions).
> 
> I like the idea of being lazy here and I've definitely seen the hit for this in
> profiles, but I worry that this will leak utf-8 data to users that are expecting
> local strings and we won't discover the problem until some end user runs it on a
> non-utf-8 system months down the road.
> 
> Because these sorts of encoding confusions are very hard to keep track of in a
> weakly-typed system, our rule has always been: limit the exposure of system to
> the secondary types as far as possible. Which is why ALL changelog
> encoding/decoding is handled today in just a couple functions in changelog.py
> and we mostly don't have to think about it.

FWIW, extra is binary. That's unfortunate we can't convert it to localstr.

  $ hg branch À
  $ hg ci -m branch
  $ HGENCODING=latin1 hg log -T '{get(extras, "branch")} {branch}\n'


More information about the Mercurial-devel mailing list