[PATCH 2 of 2 STABLE] help: search section of help topic by translated section name correctly

timeless timeless at gmail.com
Mon May 16 11:37:48 EDT 2016


I'm pretty sure it's possible to write a regular expression that will look
for any instance of them not following encoding,I think a negative look
behind should work. I'll try it in an hour or so.
On May 16, 2016 2:20 AM, "FUJIWARA Katsunori" <foozy at lares.dti.ne.jp> wrote:

>
> At Sun, 15 May 2016 20:44:46 -0700,
> timeless wrote:
> >
> > [1  <text/plain; UTF-8 (7bit)>]
> > The changes seem OK, but this is begging for a pair of check code tests
> > (one for lower, and one for upper-- each objecting to anything that isn't
> > encoding.)
>
> Yes, I also think that detection by check-code.py is better.
>
> But, there are many '.lower()'/'.upper()' invocations in Mercurial
> source code, and I don't' have a good idea to pick up only invalid
> '.lower()'/'.upper()' invocations (or low-cost trick to hide
> false-positive cases).
>
> Would you have any good ideas ?
>
>
> > On May 12, 2016 7:12 PM, "FUJIWARA Katsunori" <foozy at lares.dti.ne.jp>
> wrote:
> >
> > > # HG changeset patch
> > > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > > # Date 1463091599 -32400
> > > #      Fri May 13 07:19:59 2016 +0900
> > > # Branch stable
> > > # Node ID aaabed77791a75968a12b8c43ad263631a23ee81
> > > # Parent  9d38a2061fd8a7a4fd80ead8d5798f38b359bfe3
> > > help: search section of help topic by translated section name correctly
> > >
> > > Before this patch, "hg help topic.section" might show unexpected
> > > section of help topic in some encoding.
> > >
> > > It applies str.lower() instead of encoding.lower(str) on translated
> > > message to search section case-insensitively, but some encoding uses
> > > 0x41(A) - 0x5a(Z) as the second or later byte of multi-byte character
> > > (for example, ja_JP.cp932), and str.lower() causes unexpected result.
> > >
> > > To search section of help topic by translated section name correctly,
> > > this patch replaces str.lower() by encoding.lower(str) for both query
> > > string (in commands.help()) and translated help text (in
> > > minirst.getsections()).
> > >
> > > diff --git a/mercurial/commands.py b/mercurial/commands.py
> > > --- a/mercurial/commands.py
> > > +++ b/mercurial/commands.py
> > > @@ -4590,7 +4590,7 @@ def help_(ui, name=None, **opts):
> > >      subtopic = None
> > >      if name and '.' in name:
> > >          name, section = name.split('.', 1)
> > > -        section = section.lower()
> > > +        section = encoding.lower(section)
> > >          if '.' in section:
> > >              subtopic, section = section.split('.', 1)
> > >          else:
> > > diff --git a/mercurial/minirst.py b/mercurial/minirst.py
> > > --- a/mercurial/minirst.py
> > > +++ b/mercurial/minirst.py
> > > @@ -724,7 +724,7 @@ def getsections(blocks):
> > >              x = b['key']
> > >          else:
> > >              x = b['lines'][0]
> > > -        x = x.lower().strip('"')
> > > +        x = encoding.lower(x).strip('"')
> > >          if '(' in x:
> > >              x = x.split('(')[0]
> > >          return x
> > > diff --git a/tests/test-help.t b/tests/test-help.t
> > > --- a/tests/test-help.t
> > > +++ b/tests/test-help.t
> > > @@ -1524,6 +1524,78 @@ Test section lookup
> > >        files         List of strings. All files modified, added, or
> > > removed by
> > >                      this changeset.
> > >
> > > +Test section lookup by translated message
> > > +
> > > +str.lower() instead of encoding.lower(str) on translated message might
> > > +make message meaningless, because some encoding uses 0x41(A) - 0x5a(Z)
> > > +as the second or later byte of multi-byte character.
> > > +
> > > +For example, "\x8bL\x98^" (translation of "record" in ja_JP.cp932)
> > > +contains 0x4c (L). str.lower() replaces 0x4c(L) by 0x6c(l) and this
> > > +replacement makes message meaningless.
> > > +
> > > +This tests that section lookup by translated string isn't broken by
> > > +such str.lower().
> > > +
> > > +  $ python <<EOF
> > > +  > def escape(s):
> > > +  >     return ''.join('\u%x' % ord(uc) for uc in s.decode('cp932'))
> > > +  > # translation of "record" in ja_JP.cp932
> > > +  > upper = "\x8bL\x98^"
> > > +  > # str.lower()-ed section name should be treated as different one
> > > +  > lower = "\x8bl\x98^"
> > > +  > with open('ambiguous.py', 'w') as fp:
> > > +  >     fp.write("""# ambiguous section names in ja_JP.cp932
> > > +  > u'''summary of extension
> > > +  >
> > > +  > %s
> > > +  > ----
> > > +  >
> > > +  > Upper name should show only this message
> > > +  >
> > > +  > %s
> > > +  > ----
> > > +  >
> > > +  > Lower name should show only this message
> > > +  >
> > > +  > subsequent section
> > > +  > ------------------
> > > +  >
> > > +  > This should be hidden at "hg help ambiguous" with section name.
> > > +  > '''
> > > +  > """ % (escape(upper), escape(lower)))
> > > +  > EOF
> > > +
> > > +  $ cat >> $HGRCPATH <<EOF
> > > +  > [extensions]
> > > +  > ambiguous = ./ambiguous.py
> > > +  > EOF
> > > +
> > > +  $ python <<EOF | sh
> > > +  > upper = "\x8bL\x98^"
> > > +  > print "hg --encoding cp932 help -e ambiguous.%s" % upper
> > > +  > EOF
> > > +  \x8bL\x98^ (esc)
> > > +  ----
> > > +
> > > +  Upper name should show only this message
> > > +
> > > +
> > > +  $ python <<EOF | sh
> > > +  > lower = "\x8bl\x98^"
> > > +  > print "hg --encoding cp932 help -e ambiguous.%s" % lower
> > > +  > EOF
> > > +  \x8bl\x98^ (esc)
> > > +  ----
> > > +
> > > +  Lower name should show only this message
> > > +
> > > +
> > > +  $ cat >> $HGRCPATH <<EOF
> > > +  > [extensions]
> > > +  > ambiguous = !
> > > +  > EOF
> > > +
> > >  Test dynamic list of merge tools only shows up once
> > >    $ hg help merge-tools
> > >    Merge Tools
> > > _______________________________________________
> > > Mercurial-devel mailing list
> > > Mercurial-devel at mercurial-scm.org
> > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> > >
> > [2  <text/html; UTF-8 (quoted-printable)>]
> >
>
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160516/2d0f2681/attachment.html>


More information about the Mercurial-devel mailing list