[PATCH 2 of 2 STABLE] help: search section of help topic by translated section name correctly

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Mon May 16 02:20:28 EDT 2016


At Sun, 15 May 2016 20:44:46 -0700,
timeless wrote:
> 
> [1  <text/plain; UTF-8 (7bit)>]
> The changes seem OK, but this is begging for a pair of check code tests
> (one for lower, and one for upper-- each objecting to anything that isn't
> encoding.)

Yes, I also think that detection by check-code.py is better.

But, there are many '.lower()'/'.upper()' invocations in Mercurial
source code, and I don't' have a good idea to pick up only invalid
'.lower()'/'.upper()' invocations (or low-cost trick to hide
false-positive cases).

Would you have any good ideas ?


> On May 12, 2016 7:12 PM, "FUJIWARA Katsunori" <foozy at lares.dti.ne.jp> wrote:
> 
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1463091599 -32400
> > #      Fri May 13 07:19:59 2016 +0900
> > # Branch stable
> > # Node ID aaabed77791a75968a12b8c43ad263631a23ee81
> > # Parent  9d38a2061fd8a7a4fd80ead8d5798f38b359bfe3
> > help: search section of help topic by translated section name correctly
> >
> > Before this patch, "hg help topic.section" might show unexpected
> > section of help topic in some encoding.
> >
> > It applies str.lower() instead of encoding.lower(str) on translated
> > message to search section case-insensitively, but some encoding uses
> > 0x41(A) - 0x5a(Z) as the second or later byte of multi-byte character
> > (for example, ja_JP.cp932), and str.lower() causes unexpected result.
> >
> > To search section of help topic by translated section name correctly,
> > this patch replaces str.lower() by encoding.lower(str) for both query
> > string (in commands.help()) and translated help text (in
> > minirst.getsections()).
> >
> > diff --git a/mercurial/commands.py b/mercurial/commands.py
> > --- a/mercurial/commands.py
> > +++ b/mercurial/commands.py
> > @@ -4590,7 +4590,7 @@ def help_(ui, name=None, **opts):
> >      subtopic = None
> >      if name and '.' in name:
> >          name, section = name.split('.', 1)
> > -        section = section.lower()
> > +        section = encoding.lower(section)
> >          if '.' in section:
> >              subtopic, section = section.split('.', 1)
> >          else:
> > diff --git a/mercurial/minirst.py b/mercurial/minirst.py
> > --- a/mercurial/minirst.py
> > +++ b/mercurial/minirst.py
> > @@ -724,7 +724,7 @@ def getsections(blocks):
> >              x = b['key']
> >          else:
> >              x = b['lines'][0]
> > -        x = x.lower().strip('"')
> > +        x = encoding.lower(x).strip('"')
> >          if '(' in x:
> >              x = x.split('(')[0]
> >          return x
> > diff --git a/tests/test-help.t b/tests/test-help.t
> > --- a/tests/test-help.t
> > +++ b/tests/test-help.t
> > @@ -1524,6 +1524,78 @@ Test section lookup
> >        files         List of strings. All files modified, added, or
> > removed by
> >                      this changeset.
> >
> > +Test section lookup by translated message
> > +
> > +str.lower() instead of encoding.lower(str) on translated message might
> > +make message meaningless, because some encoding uses 0x41(A) - 0x5a(Z)
> > +as the second or later byte of multi-byte character.
> > +
> > +For example, "\x8bL\x98^" (translation of "record" in ja_JP.cp932)
> > +contains 0x4c (L). str.lower() replaces 0x4c(L) by 0x6c(l) and this
> > +replacement makes message meaningless.
> > +
> > +This tests that section lookup by translated string isn't broken by
> > +such str.lower().
> > +
> > +  $ python <<EOF
> > +  > def escape(s):
> > +  >     return ''.join('\u%x' % ord(uc) for uc in s.decode('cp932'))
> > +  > # translation of "record" in ja_JP.cp932
> > +  > upper = "\x8bL\x98^"
> > +  > # str.lower()-ed section name should be treated as different one
> > +  > lower = "\x8bl\x98^"
> > +  > with open('ambiguous.py', 'w') as fp:
> > +  >     fp.write("""# ambiguous section names in ja_JP.cp932
> > +  > u'''summary of extension
> > +  >
> > +  > %s
> > +  > ----
> > +  >
> > +  > Upper name should show only this message
> > +  >
> > +  > %s
> > +  > ----
> > +  >
> > +  > Lower name should show only this message
> > +  >
> > +  > subsequent section
> > +  > ------------------
> > +  >
> > +  > This should be hidden at "hg help ambiguous" with section name.
> > +  > '''
> > +  > """ % (escape(upper), escape(lower)))
> > +  > EOF
> > +
> > +  $ cat >> $HGRCPATH <<EOF
> > +  > [extensions]
> > +  > ambiguous = ./ambiguous.py
> > +  > EOF
> > +
> > +  $ python <<EOF | sh
> > +  > upper = "\x8bL\x98^"
> > +  > print "hg --encoding cp932 help -e ambiguous.%s" % upper
> > +  > EOF
> > +  \x8bL\x98^ (esc)
> > +  ----
> > +
> > +  Upper name should show only this message
> > +
> > +
> > +  $ python <<EOF | sh
> > +  > lower = "\x8bl\x98^"
> > +  > print "hg --encoding cp932 help -e ambiguous.%s" % lower
> > +  > EOF
> > +  \x8bl\x98^ (esc)
> > +  ----
> > +
> > +  Lower name should show only this message
> > +
> > +
> > +  $ cat >> $HGRCPATH <<EOF
> > +  > [extensions]
> > +  > ambiguous = !
> > +  > EOF
> > +
> >  Test dynamic list of merge tools only shows up once
> >    $ hg help merge-tools
> >    Merge Tools
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> >
> [2  <text/html; UTF-8 (quoted-printable)>]
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list