[PATCH rfc] check-code: detect r'...\'...'

Wed Nov 9 11:19:18 CST 2011

Mads Kiilerich <mads at kiilerich.com> writes:

> On 11/09/2011 03:31 PM, Matt Mackall wrote:
>> On Wed, 2011-11-09 at 02:37 +0100, Mads Kiilerich wrote:
>>> # HG changeset patch
>>> # User Mads Kiilerich<mads at kiilerich.com>
>>> # Date 1320802647 -3600
>>> # Node ID 20fe74e6bcd0518b108f85270c880f330963c490
>>> # Parent  de7e2fba4326cad80bda0cb100d2ae2f58e67ee8
>>> check-code: detect r'...\'...'
>>
>> Not sure why this matters:
>>
>>>>> a = r'x\'x'
>>>>> b = 'x\'x'
>>>>> a == b
>> False
>>>>> b.replace(a, 'foo') # shouldn't work
>> "x'x"
>>>>> import re
>>>>> re.sub(a, 'foo', b) # should work
>> 'foo'
>>
>> Despite the fact that r'x\'x' is really "x\\'x", the regex engine
>> turns "\\'" back into a "'" and the expression functions as expected.
>
> You are right. Assuming the raw strings are used as regexp then it for
> all(?) practical purposes doesn't matter that there are extra \'s in
> the string. It just looks confusing and sloppy, IMHO.

I agree on the sloppiness.

But there's also another small reason when it comes to docstrings:
they're extracted by i18n/hggettext and to add line numbers to the .po
file, it searches for the docstring in the original .py file. If there
are extra backslashes in the docstring, then it cannot find the
docstring and so it defaults to a line number of 1.

Not a disaster, but it is nice for a translator to be able to jump back
to the right spot in the .py file.

So if the rule works well, then I think we should put it in.

-- 
Martin Geisler

Mercurial links: http://mercurial.ch/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20111109/dc495de3/attachment.pgp>