[PATCH 2 of 6 V2] hidden: use _domainancestors to compute revs revealed by dynamic blocker

Pierre-Yves David pierre-yves.david at ens-lyon.org
Wed May 24 08:39:07 EDT 2017


On 05/24/2017 04:10 AM, Gregory Szorc wrote:
> On Tue, May 23, 2017 at 1:02 PM, Pierre-Yves David
> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> wrote:
>
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david at octobus.net
>     <mailto:pierre-yves.david at octobus.net>>
>     # Date 1495373721 -7200
>     #      Sun May 21 15:35:21 2017 +0200
>     # Node ID e72ddd1a53c4c6321e7ecd686cd24c2a8c8914bc
>     # Parent  5f964af88a0fae242ce24b0478c676d2056e0dc6
>     # EXP-Topic fast-compute-hidden
>     # Available At
>     https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>     <https://www.mercurial-scm.org/repo/users/marmoute/mercurial/>
>     #              hg pull
>     https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>     <https://www.mercurial-scm.org/repo/users/marmoute/mercurial/> -r
>     e72ddd1a53c4
>     hidden: use _domainancestors to compute revs revealed by dynamic blocker
>
>     The complexity of computing the revealed changesets is now
>     'O(revealed)'.
>     This massively speeds up the computation on large repository. Moving
>     it to the
>     millisecond range.
>
>     Below are timing from two Mozilla repositories with different contents:
>
>     1) mozilla repository with:
>      * 400667 changesets
>      * 35 hidden changesets (first rev-268334)
>      * 288 visible drafts
>      * obsolete working copy (dynamicblockers),
>
>     Before:
>     ! visible
>     ! wall 0.030247 comb 0.030000 user 0.030000 sys 0.000000 (best of 100)
>
>     After:
>     ! visible
>     ! wall 0.000585 comb 0.000000 user 0.000000 sys 0.000000 (best of 4221)
>
>     The timing above include the computation of obsolete changeset:
>     ! obsolete
>     ! wall 0.000396 comb 0.000000 user 0.000000 sys 0.000000 (best of 6816)
>
>     So adjusted time give 30ms before versus 0.2ms after. A 150x speedup.
>
>     2) mozilla repository with:
>      * 405645 changesets
>      * 4312 hidden changesets (first rev-326004)
>      * 264 visible drafts
>      * obsolete working copy (dynamicblockers),
>
>     Before:
>     ! visible
>     ! wall 0.168658 comb 0.170000 user 0.170000 sys 0.000000 (best of 48)
>
>     After
>     ! visible
>     ! wall 0.008612 comb 0.010000 user 0.010000 sys 0.000000 (best of 325)
>
>     The timing above include the computation of obsolete changeset:
>     ! obsolete
>     ! wall 0.006408 comb 0.010000 user 0.010000 sys 0.000000 (best of 404)
>
>     So adjusted time give 160ms before versus 2ms after. A 75x speedup.
>
>     diff --git a/mercurial/repoview.py b/mercurial/repoview.py
>     --- a/mercurial/repoview.py
>     +++ b/mercurial/repoview.py
>     @@ -212,8 +212,10 @@ def computehidden(repo):
>              # changesets and remove those.
>              dynamic = hidden & revealedrevs(repo)
>              if dynamic:
>     -            blocked = cl.ancestors(dynamic, inclusive=True)
>     -            hidden = frozenset(r for r in hidden if r not in blocked)
>
>
> An obvious problem with this old code is that the "r not in blocked" bit
> is an O(n) list lookup.

Actually, this is not a list lookup. `cl.ancestors` returns a smart lazy 
ancestors[1]. The object use a set for membership testing[2] and walk 
the changelog on demands for it ('r in lazyancestors' stop walking once 
'r' has been passed).

[1] code for changelog.ancestors:
 
https://www.mercurial-scm.org/repo/hg/file/bdc4861ffe59/mercurial/revlog.py#l574
[2] lazy ancestors membership lookup:
 
https://www.mercurial-scm.org/repo/hg/file/bdc4861ffe59/mercurial/ancestor.py#l334

> Out of curiosity, how does the perf of "blocked
> = set(cl.ancestors(dynamic, inclusive=True))" compare to the new code
> using _domainancestors()?

Calling "set" on the lazy ancestors will force the object to iterate 
down to the bottom of the repository. That will not raise better result.

The speed up in this patch come from a drastically change the asymptotic 
complexity:
- from: O(len(repo) - O(rev(first(not public)))
- to:   O(len(mutable & visible))

[details below]


Complexity of the old code
==========================

the old code naively search ancestors without taking phases into 
account. This means we'll walk a large amount of public changesets to 
check if an old (non public) changeset is the ancestor or a recent revision.

Before this patch, we call:

     hiderablerevs in cl.ancestors(blockers)

We can reduced (complexity meaning "reduction) tp:

     min(hideablerevs) in ancestors(max(blockers)

Which can be reduced to:

     target = min(hideablerevs)
     current = max(blocker)
     while target < current:
         current = parent(current) # one parent to simplify

Complexity of the new code
==========================

The new could walk find blockers and walk from them. Stopping as soon as 
we reach public (non-mutable) changeset. So we do not walk more 
changeset that the one that will be visible.

A basic version of that new code is:

     # code is a bit wider for clarity
     actual_blockers = hideable & blockers
     revealed = set()
     for current in actual_blockers:
         while not public(current):
	    revealed.add(current)
             current = parent(current)
     hidden = hideable - revealed

Is the situation clearer to you now?

Cheers,

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list