[PATCH 4 of 5] encoding: add fast path of from/tolocal() for ASCII strings
Yuya Nishihara
yuya at tcha.org
Fri Aug 18 10:14:12 EDT 2017
# HG changeset patch
# User Yuya Nishihara <yuya at tcha.org>
# Date 1492920383 -32400
# Sun Apr 23 13:06:23 2017 +0900
# Node ID a359d30ebd4a3d7a26717d85564716290bc028c9
# Parent 394f90f7dacb585c23ec5efa8958c37620ce2418
encoding: add fast path of from/tolocal() for ASCII strings
This is micro optimization, but seems not bad since to/fromlocal() is called
lots of times and isasciistr() is cheap and simple.
We boldly assume that any non-ASCII characters have at least one 8-bit byte.
This isn't true for some email character sets (e.g. ISO-2022-JP and UTF-7),
but I believe no such encodings are used as a platform default. Shift_JIS,
a major crap, is okay as it should have a leading byte in 0x80-0xff range.
(with mercurial repo)
$ export HGRCPATH=/dev/null HGPLAIN=
$ hg log --time --config experimental.stabilization=all > /dev/null
(original)
time: real 7.460 secs (user 7.420+0.000 sys 0.030+0.000)
time: real 7.670 secs (user 7.590+0.000 sys 0.080+0.000)
time: real 7.560 secs (user 7.510+0.000 sys 0.040+0.000)
(this patch)
time: real 7.340 secs (user 7.260+0.000 sys 0.060+0.000)
time: real 7.260 secs (user 7.210+0.000 sys 0.030+0.000)
time: real 7.310 secs (user 7.260+0.000 sys 0.060+0.000)
diff --git a/mercurial/encoding.py b/mercurial/encoding.py
--- a/mercurial/encoding.py
+++ b/mercurial/encoding.py
@@ -128,6 +128,9 @@ def tolocal(s):
'foo: \\xc3\\xa4'
"""
+ if isasciistr(s):
+ return s
+
try:
try:
# make sure string is actually stored in UTF-8
@@ -170,6 +173,8 @@ def fromlocal(s):
# can we do a lossless round-trip?
if isinstance(s, localstr):
return s._utf8
+ if isasciistr(s):
+ return s
try:
u = s.decode(_sysstr(encoding), _sysstr(encodingmode))
diff --git a/tests/test-encoding-func.py b/tests/test-encoding-func.py
--- a/tests/test-encoding-func.py
+++ b/tests/test-encoding-func.py
@@ -28,6 +28,12 @@ class IsasciistrTest(unittest.TestCase):
t[i] |= 0x80
self.assertFalse(encoding.isasciistr(bytes(t)))
+class LocalEncodingTest(unittest.TestCase):
+ def testasciifastpath(self):
+ s = b'\0' * 100
+ self.assertTrue(s is encoding.tolocal(s))
+ self.assertTrue(s is encoding.fromlocal(s))
+
if __name__ == '__main__':
import silenttestrunner
silenttestrunner.main(__name__)
More information about the Mercurial-devel
mailing list