[PATCH 6 of 8] dirstate: create class for status lists

Thu Oct 2 09:26:43 CDT 2014

On Thu, Oct 2, 2014 at 5:46 AM, Mads Kiilerich <mads at kiilerich.com> wrote:
> On 10/02/2014 07:00 AM, Martin von Zweigbergk wrote:
>>
>> Thanks for taking a look! Answers below to some of your questions.
>> I'll leave the others for later, or they might just get address in v2
>> of the series.
>>
>> On Wed, Oct 1, 2014 at 6:40 PM, Mads Kiilerich <mads at kiilerich.com> wrote:
>>>
>>> A general comment: A "." lookup is relatively expensive. It seems like
>>> you
>>> introduce a lot of them. We should take care to not introduce more of
>>> them
>>> in tight loops (that might already be the case - I did not check and not
>>> see
>>> any statement on whether that was the case).
>>
>> Hmm, I don't think I even considered that (I've never used python before).
>>
>> I tried running the test suite:
>>
>> Before:
>> real    3m22.026s
>> user    25m31.715s
>> sys 5m45.684s
>>
>> After:
>> real    3m24.136s
>> user    25m29.724s
>> sys     5m46.637s
>>
>> I also tried this:
>>
>> $ python -mtimeit -s'a=(1,2,3,4,5,6)' 'a[3]'
>> 10000000 loops, best of 3: 0.0402 usec per loop
>>
>> $ python -mtimeit -s'class c(object):
>>    def __init__(self,a,b,c,d,e,f):
>>      self.a=a
>>      self.b=b
>>      self.c=c
>>      self.d=d
>>      self.e=e
>>      self.f=f
>> a=c(1,2,3,4,5,6)' 'a.d'
>> 10000000 loops, best of 3: 0.0458 usec per loop
>
>
> Yes, these two are pretty much the same. But:
>
> $ python -mtimeit -s'a=(1,2,3,4,5,6)' -s 'b=a[0]' 'x=b'
> 100000000 loops, best of 3: 0.0114 usec per loop
>
> $ python -mtimeit -s'class c(object):
>   def __init__(self,a,b,c,d,e,f):
>     self.a=a
>     self.b=b
>     self.c=c
>     self.d=d
>     self.e=e
>     self.f=f
> a=c(1,2,3,4,5,6)' 'x=a.d'
> 10000000 loops, best of 3: 0.026 usec per loop

That doesn't seem like a fair comparison to me. Assuming that '-s'
means "this is setup, don't include in timing", the only thing your
tuple test case does is to test assignment, right? So it's not
comparing tuple lookup vs field name lookup.

> That is why we do crazy stuff like http://selenic.com/hg/rev/b6dc3b79bb25
> ... but only in tight loops.

That seems to only show that looking up is slower than not looking up;
it's not comparing lookup of tuple items and field names. Still, it's
something I wasn't aware of, so I'll make sure not to add any lookups
in loops in my next revision of the series (i.e, I'll do things like
"m = s.modified" outside of loops).

> The test suite do not have "realistic" repo sizes and will not reveal new
> hot spots.
>
>>> I also wonder if it would make sense to make "unsure" an optional member
>>> of
>>> the status struct. I don't know.
>>
>> I know, and it doesn't :-). I did that at first, but it just got ugly
>> with having to assert (or assume) that lookup/unsure was empty where
>> it should be.
>
>
> Would it be that different from how .clean also is "optional"?

Yes, the problem with lookup is that it's not a real status, it's just
an optimization. Callers who care will have to resolve it to be either
modified or clean. Forgetting to resolve gives an incorrect list of
files for .modified and .clean.

>
> /Mads