[PATCH 1 of 4 STABLE] i18n: use UTF-8-ed string to check use of characters reserved for Windows

Sat Dec 24 14:31:11 UTC 2011

# HG changeset patch
# User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
# Date 1324735129 -32400
# Branch stable
# Node ID 01fe3735501eeca6004a36d1f961ec50414ef163
# Parent  32a6e00e4cfe84573b9d4dc71155d0dc23b75faf
i18n: use UTF-8-ed string to check use of characters reserved for Windows

some problematic encoding (e.g.: cp932) uses characters reserved for
Windows (e.g.: "|") in byte sequence of multi byte characters.

so, direct checking on byte sequence causes unexpected warning/abort
in such encoding.

this patch uses UTF-8-ed string instead of raw one.

in UTF-8-ed string, such ambiguous byte sequences are converted into
non-ASCII area, and real uses of reserved characters are kept as same
as it is.

this logic is cheaper than iteration per character instead of per
byte, because this does not require invocations of "encode()" on each
characters.

diff -r 32a6e00e4cfe -r 01fe3735501e mercurial/util.py

--- a/mercurial/util.py	Tue Dec 20 14:11:14 2011 -0600
+++ b/mercurial/util.py	Sat Dec 24 22:58:49 2011 +0900
@@ -533,7 +533,7 @@
     for n in path.replace('\\', '/').split('/'):
         if not n:
             continue
-        for c in n:
+        for c in encoding.fromlocal(n):
             if c in _winreservedchars:
                 return _("filename contains '%s', which is reserved "
                          "on Windows") % c