[PATCH 2 of 4] Do not use str.startswith or str.endswith to find a single character

Nicolas Dumazet nicdumz at gmail.com
Sun Aug 23 08:21:00 CDT 2009


2009/8/23 Benoit Boissinot <benoit.boissinot at ens-lyon.org>:
> On Sat, Aug 22, 2009 at 08:39:43PM +0200, Nicolas Dumazet wrote:
>> # HG changeset patch
>> # User Nicolas Dumazet <nicdumz.commits at gmail.com>
>> # Date 1250960622 -7200
>> # Node ID 51f0751ed3681aee75842e2351445ff30054062b
>> # Parent  f7a1dd3fa57b6be41c0c5535a16531b710571bb7
>> Do not use str.startswith or str.endswith to find a single character
>>
>> When startswith (resp. endswith) is called on a single character, it is two times
>> faster to directly compare that single character with str[0] (resp. str[-1]).
>>
>> * str.startswith(char) -> (str and) str[0] == char
>> * str.endswith(char) -> (str and) str[-1] == char
>
> I don't really like this one, I think it hurts the readability. Did you
> check the python source to understand the issue (and does your benchmark
> include the time to test if the string is not empty?) ?

That's what I think too, it does hurt readability when we have to
include non-emptiness tests. But when we know that the string can't be
empty, I think that readability of s[0] == 'a' is equivalent to
s.startswith('a').

In terms of C, we are comparing (string conversions, size comparisons
and a memcmp) vs (a pointer access and a single-character-string
creation, knowing that characters/single-letter strings are cached in
CPython).

A few figures (Python 2.6):

cmp('a', 'a'): a x2 improvement

> python -m timeit "s='a'; s and 'a'==s[0]"
1000000 loops, best of 3: 0.646 usec per loop
> python -m timeit "s='a'; s.startswith('a')"
1000000 loops, best of 3: 1.14 usec per loop

cmp('a', 'b'): a x2 improvement

> python -m timeit "s='a'; s and 'b'==s[0]"
1000000 loops, best of 3: 0.645 usec per loop
> python -m timeit "s='a'; s.startswith('b')"
1000000 loops, best of 3: 1.14 usec per loop

cmp('a', ''): a x3 improvement

> python -m timeit "s=''; s and 'b'==s[0]"
1000000 loops, best of 3: 0.298 usec per loop
> python -m timeit "s=''; s.startswith('b')"
1000000 loops, best of 3: 1 usec per loop


I think that I would still advocate for readability over performance:
what about introducing these changes only when the target string is
known to always be non-empty? In this case, trading
"str.startswith(char)" for "str[0] == char" and a x2 performance gain
seems reasonable to me.

-- 
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]



More information about the Mercurial-devel mailing list