[PATCH] revset: make filteredset.__nonzero__ respect the order of the filteredset

Augie Fackler raf at durin42.com
Mon Jun 6 12:03:08 EDT 2016


On Thu, Jun 02, 2016 at 10:58:31PM +0100, Kostia Balytskyi wrote:
> # HG changeset patch
> # User Kostia Balytskyi <ikostia at fb.com>
> # Date 1464903541 -3600
> #      Thu Jun 02 22:39:01 2016 +0100
> # Node ID ac1d06eeb8558f93d8f013b23ca9ce20892b9fc0
> # Parent  9ac309946df9e69bb73ded75f2fc5b84a4e785c2
> revset: make filteredset.__nonzero__ respect the order of the filteredset

Seems reasonable, and a worthwhile perf fix. Queued.

>
> This fix allows __nonzero__ to respect the direction of iteration of the
> whole filteredset. Here's the case when it matters. Imagine that we have a
> very large repository and we want to execute a command like:
>
>     $ hg log --rev '(tip:0) and user(ikostia)' --limit 1
>
> (we want to get the latest commit by me).
>
> Mercurial will evaluate a filteredset lazy data structure, an
> instance of the filteredset class, which will know that it has to iterate
> in a descending order (isdescending() will return True if called). This
> means that when some code iterates over the instance of this filteredset,
> the 'and user(ikostia)' condition will be first checked on the latest
> revision, then on the second latest and so on, allowing Mercurial to
> print matches as it founds them. However, cmdutil.getgraphlogrevs
> contains the following code:
>
>     revs = _logrevs(repo, opts)
>     if not revs:
>         return revset.baseset(), None, None
>
> The "not revs" expression is evaluated by calling filteredset.__nonzero__,
> which in its current implementation will try to iterate the filteredset
> in ascending order until it finds a revision that matches the 'and user(..'
> condition. If the condition is only true on late revisions, a lot of
> useless iterations will be done. These iterations could be avoided if
> __nonzero__ followed the order of the filteredset, which in my opinion
> is a sensible thing to do here.
>
> The problem gets even worse when instead of 'user(ikostia)' some more
> expensive check is performed, like grepping the commit diff.
>
>
> I tested this fix on a very large repo where tip is my commit and my very
> first commit comes fairly late in the revision history. Results of timing
> of the above command on that very large repo.
>
> -with my fix:
> real    0m1.795s
> user    0m1.657s
> sys     0m0.135s
>
> -without my fix:
> real    1m29.245s
> user    1m28.223s
> sys     0m0.929s
>
> I understand that this is a very specific kind of problem that presents
> itself very rarely, only on very big repositories and with expensive
> checks and so on. But I don't see any disadvantages to this kind of fix
> either.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2743,9 +2743,16 @@
>          return lambda: self._iterfilter(it())
>
>      def __nonzero__(self):
> -        fast = self.fastasc
> -        if fast is None:
> -            fast = self.fastdesc
> +        fast = None
> +        candidates = [self.fastasc if self.isascending() else None,
> +                      self.fastdesc if self.isdescending() else None,
> +                      self.fastasc,
> +                      self.fastdesc]
> +        for candidate in candidates:
> +            if candidate is not None:
> +                fast = candidate
> +                break
> +
>          if fast is not None:
>              it = fast()
>          else:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list