[PATCH] Also treat "East Asian Ambiguous" characters as full-width

Shun-ichi Goto shunichi.goto at gmail.com
Thu Oct 29 21:46:52 CDT 2009


# HG changeset patch
# User Shun-ichi GOTO <shunichi.goto at gmail.com>
# Date 1256870784 -32400
# Node ID ba57125215c3a2adc760e98668e523a13f647dac
# Parent  3c30ae2d6f1bc0b74c87ea46b695e59485f79414
Also treat "East Asian Ambiguous" characters as full-width.

"East Asian Ambiguous" characters like 'GREEK SMALL LETTER BETA'
(U+03B2) or 'MULTIPLICATION SIGN' (U+00D7) should be counted as
full-width because it depends on the context.

See also:
 "Unicode Standard Annex #11 - East Asian Width"
 http://www.unicode.org/reports/tr11/tr11-14.html#Ambiguous

diff -r 3c30ae2d6f1b -r ba57125215c3 mercurial/encoding.py
--- a/mercurial/encoding.py	Wed Oct 28 13:36:23 2009 +0900
+++ b/mercurial/encoding.py	Fri Oct 30 11:46:24 2009 +0900
@@ -72,6 +72,6 @@
     d = s.decode(encoding, 'replace')
     if hasattr(unicodedata, 'east_asian_width'):
         w = unicodedata.east_asian_width
-        return sum([w(c) in 'WF' and 2 or 1 for c in d])
+        return sum([w(c) in 'WFA' and 2 or 1 for c in d])
     return len(d)
 


More information about the Mercurial-devel mailing list