[PATCH RFC V2] hgweb: code selection without line numbers in file source view

Matt Mackall mpm at selenic.com
Mon Jul 1 12:46:15 CDT 2013


On Mon, 2013-07-01 at 12:02 +0200, Laurens Holst wrote:
> Op 01-07-13 01:06, Matt Mackall schreef:
> > On Sun, 2013-06-30 at 20:44 +0200, Laurens Holst wrote:
> >> Op 29-6-2013 23:09, Alexander Plavin schreef:
> >>>>> +  <li class="parity{parity} source" id="{lineid}"><div>{ifeq(strip(line,
> >>>>> '\r\n'), '', ' ', '')}{strip(line|escape, "\r\n")}</div></li>'
> >>>> I would use \u00A0 instead of  .
> >>> Why is this better?
> >> I don’t like the entities. <Insert some irrelevant reason about XML not
> >> being able to understand HTML’s extended set of entities without a DTD
> >> even though this is neither XML nor XHTML>. And it’s more bytes.
> >>
> >> In a Unicode world, no reason to not just output the actual character
> >> you want.
> > Quiz: what happens when hgweb's output encoding is set to Windows
> > CP1252?
> 
> Presumably it’d be fine, cause the non-breaking space also exists in 
> CP1252 at the same code point...

The hgweb templates, like everything else in Mercurial, are byte
strings. They must be ASCII to work in all the encodings that hgweb
might be configured to use. Generally you'll want to configure it to
match the encoding of the source files you're tracking.

You could stick \0xA0 in there, but you'll get random results in code
pages that don't happen to coincide nicely with Latin1. Whereas  
continues to work when you're using, say, Shift-JIS or UTF-8.

(You can't actually stick "\ua0" in there.. because the templates aren't
Unicode strings and are not transcoded to the local encoding before
being output... which would have its own big problems, as Python's
codecs have no provisions for approximately handling the nonexistence of
NBSP in lots of encodings.)

> > Quiz: what percentage of developers do you expect to have the _UTF-8_
> > byte sequence (never mind the raw Unicode) for nbsp memorized?
> 
> I was mostly thinking when looking at the source view it would look 
> cleaner, instead of having entities all over.
> 
> Also, I don’t know what the UTF-8 byte sequence has to do with anything 
> unless someone’d be looking at the HTML source with a hex editor.

I don't care about people viewing source, I care about our own
developers who will have to know what \xc2\xa0 means when hacking on
templates. Which, if we wrongly assume everyone's using Unicode, is what
we'd be sticking in the templates.

> > Quiz: how often do you use \x0d rather than \n when coding? how often is
> > it the right thing to do?
> 
> Well \n still outputs the actual character.

Except when it doesn't (on Windows, for instance). But that's not my
point. My point is that sticking hex codes in your source is generally
considered obfuscation.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list