[PATCH] cleanup: use x in (a, b) instead of x == a or x == b

Thu Sep 23 15:05:09 CDT 2010

On Thu, Sep 23, 2010 at 12:41 PM, Matt Mackall <mpm at selenic.com> wrote:
> On Thu, 2010-09-23 at 13:09 +0200, Martin Geisler wrote:
>> Masklinn <masklinn at masklinn.net> writes:
>>
>> > On 2010-09-23, at 13:24 , Martin Geisler wrote:
>> >> Brodie Rao <brodie at bitheap.org> writes:
>> >>> # HG changeset patch
>> >>> # User Brodie Rao <brodie at bitheap.org>
>> >>> # Date 1285218151 18000
>> >>> # Node ID a9641ee7e37cf6da972ef07e9c8992316587c51a
>> >>> # Parent  f1e8d6f6e682a74a135ba6509b2de26cc8172046
>> >>> cleanup: use x in (a, b) instead of x == a or x == b
>> >>
>> >> I believe I tested this at some point using timeit and found that the
>> >> longer form is faster. Did you test is too?
>> >
>> > Unless the potentially containing sequence is cached, it probably
>> > won't be faster. But in cases where performance on the order of a
>> > fraction of a nanosecond per call[0] is not a very important matter, I
>> > think the readability and DRY of the `in` form is preferable to the
>> > double-eq. YMMV
>>
>> Yeah, I should not have focussed so much on the raw performance of this.
>> I was just being silly, I'll push it in a second.
>
> I should have chimed in last night.
>
> First, I think this is basically negligible as a visual style
> improvement.
>
> Second, I think Masklinn's performance analysis is wrong because it was
> run against literals. I've timed this in the last couple months, I think
> in response to a patch from Nic. Remember, Python has basically no
> optimization in its compiler and we can't really expect it to do well
> with a change like:
>
> -                if p == a or p == b: # did we find a or b as a parent?
> +                if p in (a, b): # did we find a or b as a parent?
>
> (hey look, a completely pointless comment!)
>
> The first the compiler translates into something approximately
> equivalent to the C code (ok, really, it's interpreted bytecode and
> completely different, but this is illustrative of what goes on in a
> high-level language):
>
>   if (compareobjects(p, a) == 0 || compareobjects(p, b) == 0) {
>
> while the second turns into something like:
>
>   t = buildtuple(2, a, b); # varargs func, trip to memory allocator
>
>   if (objectcontains(t, p)) {  # double-indirect method lookup
>   ...
>   deleteobject(t); # another trip to memory allocator
>   ...
>
>   int tuplecontains(object *t, object *p)
>   {
>       int i;
>
>       for (i = 0; i < tuplelen(t); i++) # not a constant!
>           if (compareobjects(p, a))
>               return 1;
>       return 0;
>   }
>
> In other words, this kind of construct is just making life hard for
> Python. Wrapping something in parens is deceptively expensive.
>
> And indeed this is quite visible in a simple test:
>
> optimal:
> $ python -m timeit -s 'a, b, c = 0, 0, 2' -c 'if a == b or a == c: pass'
> 10000000 loops, best of 3: 0.083 usec per loop
> $ python -m timeit -s 'a, b, c = 0, 0, 2' -c 'if a == (b, c): pass'
> 1000000 loops, best of 3: 0.28 usec per loop
>
> pessimal:
> $ python -m timeit -s 'a, b, c = 0, 1, 2' -c 'if a == b or a == c: pass'
> 10000000 loops, best of 3: 0.134 usec per loop
> $ python -m timeit -s 'a, b, c = 0, 1, 2' -c 'if a == (b, c): pass'
> 1000000 loops, best of 3: 0.29 usec per loop
>
> Ouch! >2x to >3x.
>
> I'll do a (linear) backout on this after breakfast.

Whoops, you're right. I should've included some times with the patch.
Sorry about that.

Testing for individual characters with a string seems faster, though.

Optimal:

$ python -m timeit -s 'c = "a"' 'if c == "a" or c == "b": pass'
10000000 loops, best of 3: 0.0402 usec per loop
$ python -m timeit -s 'c = "a"' 'if c in "ab": pass'
10000000 loops, best of 3: 0.0391 usec per loop

Pessimal:

$ python -m timeit -s 'c = "z"' 'if c == "a" or c == "b": pass'
10000000 loops, best of 3: 0.0746 usec per loop
$ python -m timeit -s 'c = "z"' 'if c in "ab": pass'
10000000 loops, best of 3: 0.0426 usec per loop

I think the parts of the patch that use the if c in "abc" form should stay.