[PATCH 5 of 6] check-code: detect "missing _() in ui message" more exactly

Yuya Nishihara yuya at tcha.org
Thu Jun 2 09:36:09 EDT 2016


On Wed, 1 Jun 2016 12:25:09 -0400, timeless wrote:
> Yuya Nishihara wrote:
> > timeless wrote:  
> > > FUJIWARA Katsunori wrote:  
> > > > This patch also applies "()" instead of "_()" on messages below to
> > > > hide false-positives:  
> > >
> > > I'd really rather have a function for this. See the other thread where
> > > you wanted to remove `_()`.  
> >
> > I don't get it. Why do we need a function for py3k? Can you elaborate?  
> 
> While we like to think about localized strings as Unicode and unlocalized
> strings as bytes,

Why do they have to be different? I think it should be simpler if they are
both bytes, (or unicodes if we want.)

> at the end of the day our output stream (stdout/stderr)
> can only be one or the other. It's going to be Unicode, having some bytes
> sent to the thing that tries to generate output in Unicode means that in
> one code path we're stuck doing a conversion. We can do it there (in warn),
> I suppose, but I don't think we are today, and in some ways it's easier to
> just have all callers provide the same encoding.

I'm not sure if I can get what you mean, but if you're going to make the
underlying stdout/err streams accept unicodes, that is wrong. We need raw
streams because we have to process binary data such as reading/writing patches
from/to stdio. Encoding conversion is lossy even if it raises no error.

So, ui.fout and ui.write() should accept bytes. We could make ui.warn(),
.note(), etc. accept unicodes, but it would be source of trouble to mix bytes
and unicodes in weakly-typed language. Also, from my experience, many
developers don't understand unicode nor character encoding.


More information about the Mercurial-devel mailing list