[PATCH evolve-ext] inhibit: improve transaction marker perf

Tue Nov 17 20:54:29 CST 2015

On 11/08/2015 09:24 AM, Durham Goode wrote:
>
>
> On 11/8/15 12:04 AM, Martin von Zweigbergk wrote:
>>
>>
>> On Sat, Nov 7, 2015 at 10:15 PM Durham Goode <durham at fb.com
>> <mailto:durham at fb.com>> wrote:
>>
>>
>>
>>     On 11/7/15 9:52 PM, Martin von Zweigbergk wrote:
>>>
>>>
>>>     On Sat, Nov 7, 2015 at 5:21 PM Durham Goode <durham at fb.com
>>>     <mailto:durham at fb.com>> wrote:
>>>
>>>         # HG changeset patch
>>>         # User Durham Goode <durham at fb.com <mailto:durham at fb.com>>
>>>         # Date 1446945001 28800
>>>         #      Sat Nov 07 17:10:01 2015 -0800
>>>         # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62
>>>         # Parent  48547b4c77defdd17c670b1eb0eb94272edf0207
>>>         inhibit: improve transaction marker perf
>>>
>>>         The old algorithm was a revset "::X and obsolete()". This was
>>>         inefficient because
>>>         it requires walking all the way down the ancestor chain
>>>         (since the revset did
>>>         not know it could stop walking at public nodes).
>>>
>>>
>>>     I was hoping to reproduce the slowness on the Mozilla repo (270k
>>>     revisions), but "hg log -r '::tip and obsolete()'" runs in 180
>>>     ms. Do you have a better command for me to try? How many
>>>     obsmarkers in the repo you tried it on?
>>     You need to make sure you pass --hidden, otherwise the obsolete()
>>     revset resolves to an empty set and tests against it are cheap.
>>
>>
>> I'm still not able to reproduce this :-( Some tests take ~800 ms, but
>> most of that time seems to be related to loading obsmarkers and not
>> about iterating over the revset.
>>
>> I wanted to see if I could get the same results by playing with the
>> revset optimizer. I have never looked at that code before and I don't
>> know if it's a good idea. If it is, I'll have to let you do it
>> yourself since I can't even test it.
> I looked into this a bit more.  I think this performance is due to the
> nature of our largest repo.  We had three large repos which were then
> merged together, so there are three distinct branches in history.  The
> --profile of this revset is as follows:
>
> ~/foo> time hg log -r '::tip and obsolete()'  --hidden --profile
> | 100.0%  cmdutil.py:     getlogrevs               line 4792:  revs,
> expr, filematcher = c...
>   \ 93.3%  revset.py:      __nonzero__              line 2135:  if not revs:
>     | 93.3%  revset.py:      _iterfilter            line 3110:  for r in it:
>     | 93.3%  revset.py:      _desccontains          line 3084:  if cond(x):
>     | 89.6%  revset.py:      _consumegen            line 3460:  for l in
> self._consumegen():
>     | 82.2%  revset.py:      iterate                line 3508:  for item
> in self._gen:
>     | 30.4%  changelog.py:   parentrevs             line 56:  for parent
> in cl.parentrevs...
>     | 13.3%  revlog.py:      parentrevs             line 231:  return
> super(changelog, sel...
>
> Most of the time is in iterate, which is where the heapq management
> goes. I think having the three branches in history cause more heap work
> than without (in fact, if I do "(A::tip | B::tip | C::tip) and
> obsolete()", where A B and C are roots of the three histories, it's way
> faster than just ::tip and the iterate disappears from the profile.
>
> Either way, I think my fix still applies, since it fixes the O() entirely.

A::tip is a whole different algorithm than ::tip. It is non lazy and in 
C. I'm not surprise this boost things up.

-- 
Pierre-Yves David