[PATCH 1 of 8] _addrevision: refactor out the selection of candidate revisions

Sun Jan 14 15:46:35 EST 2018

On Sun, Jan 14, 2018 at 2:28 AM, Paul Morelle <paul.morelle at octobus.net>
wrote:

> # HG changeset patch
> # User Paul Morelle <paul.morelle at octobus.net>
> # Date 1515668342 -3600
> #      Thu Jan 11 11:59:02 2018 +0100
> # Node ID 7526dfca3d32e7c51864c21de2c2f4735c4cade6
> # Parent  4b68ca118d8d316cff1fbfe260e8fdb0dae3e26a
> # EXP-Topic refactor-revlog
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 7526dfca3d32
> _addrevision: refactor out the selection of candidate revisions
>

This refactor seems to preserve the existing behavior AFAICT. The code is
not easy to read. But that was a preexisting problem. Hopefully the end
state of this series improves matters more. If not, that can be done in
follow-ups.

Queued part 1.

Will look at the subsequent parts soon...

Also, please use "revlog:" in the commit message.

>
> The new function will be useful to retrieve all the revisions which will be
> needed to determine the best delta, and parallelize the computation of the
> necessary diffs.
>
> diff -r 4b68ca118d8d -r 7526dfca3d32 mercurial/revlog.py
> --- a/mercurial/revlog.py       Thu Jan 11 11:57:59 2018 +0000
> +++ b/mercurial/revlog.py       Thu Jan 11 11:59:02 2018 +0100
> @@ -1844,6 +1844,44 @@
>
>          return True
>
> +    def _getcandidaterevs(self, p1, p2, cachedelta):
> +        """
> +        Provides revisions that present an interest to be diffed against,
> +        grouped by level of easiness.
> +        """
>

I'd appreciate a bit more documentation around the intended use of the
generator stream. Essentially, it is emitting iterables of revs and the
first group that yields a rev with a suitable delta is used and the client
stops processing. That's kinda a weird pattern. But it is how the existing
logic works. Using a generator is a clever way to codify that logic.

> +        curr = len(self)
> +        prev = curr - 1
> +        p1r, p2r = self.rev(p1), self.rev(p2)
>

curr, p1r, and p2r are already calculated by the caller. I'd pass them into
this function. Please send a follow-up.

> +
> +        # should we try to build a delta?
> +        if prev != nullrev and self.storedeltachains:
>

We could change this to `if pre == nullrev or not self.storedeltachains:
return` and dedent the following block. That would make things a bit easier
to read.

> +            tested = set()
> +            # This condition is true most of the time when processing
> +            # changegroup data into a generaldelta repo. The only time it
> +            # isn't true is if this is the first revision in a delta chain
> +            # or if ``format.generaldelta=true`` disabled
> ``lazydeltabase``.
> +            if cachedelta and self._generaldelta and self._lazydeltabase:
> +                # Assume what we received from the server is a good choice
> +                # build delta will reuse the cache
> +                yield (cachedelta[0],)
> +                tested.add(cachedelta[0])
> +
> +            if self._generaldelta:
> +                # exclude already lazy tested base if any
> +                parents = [p for p in (p1r, p2r)
> +                           if p != nullrev and p not in tested]
> +                if parents and not self._aggressivemergedeltas:
> +                    # Pick whichever parent is closer to us (to minimize
> the
> +                    # chance of having to build a fulltext).
> +                    parents = [max(parents)]
> +                tested.update(parents)
> +                yield parents
> +
> +            if prev not in tested:
> +                # other approach failed try against prev to hopefully
> save us a
> +                # fulltext.
> +                yield (prev,)
>

Having looked at this code in detail as part of the review, it seems
nonsensical to me to emit prev as a candidate revision when generaldelta is
being used. I'd refactor the code to use separate branches for generaldelta
and non-generaldelta scenarios. But this can be done as follow-up (if it
isn't addressed by later patches in this series).

>      def _addrevision(self, node, rawtext, transaction, link, p1, p2,
> flags,
>                       cachedelta, ifh, dfh, alwayscache=False):
>          """internal function to add revisions to the log
> @@ -1943,42 +1981,16 @@
>          else:
>              textlen = len(rawtext)
>
> -        # should we try to build a delta?
> -        if prev != nullrev and self.storedeltachains:
> -            tested = set()
> -            # This condition is true most of the time when processing
> -            # changegroup data into a generaldelta repo. The only time it
> -            # isn't true is if this is the first revision in a delta chain
> -            # or if ``format.generaldelta=true`` disabled
> ``lazydeltabase``.
> -            if cachedelta and self._generaldelta and self._lazydeltabase:
> -                # Assume what we received from the server is a good choice
> -                # build delta will reuse the cache
> -                candidatedelta = builddelta(cachedelta[0])
> -                tested.add(cachedelta[0])
> +        for candidaterevs in self._getcandidaterevs(p1, p2, cachedelta):
> +            nominateddeltas = []
> +            for candidaterev in candidaterevs:
> +                candidatedelta = builddelta(candidaterev)
>                  if self._isgooddelta(candidatedelta, textlen):
> -                    delta = candidatedelta
> -            if delta is None and self._generaldelta:
> -                # exclude already lazy tested base if any
> -                parents = [p for p in (p1r, p2r)
> -                           if p != nullrev and p not in tested]
> -                if parents and not self._aggressivemergedeltas:
> -                    # Pick whichever parent is closer to us (to minimize
> the
> -                    # chance of having to build a fulltext).
> -                    parents = [max(parents)]
> -                tested.update(parents)
> -                pdeltas = []
> -                for p in parents:
> -                    pd = builddelta(p)
> -                    if self._isgooddelta(pd, textlen):
> -                        pdeltas.append(pd)
> -                if pdeltas:
> -                    delta = min(pdeltas, key=lambda x: x[1])
> -            if delta is None and prev not in tested:
> -                # other approach failed try against prev to hopefully
> save us a
> -                # fulltext.
> -                candidatedelta = builddelta(prev)
> -                if self._isgooddelta(candidatedelta, textlen):
> -                    delta = candidatedelta
> +                    nominateddeltas.append(candidatedelta)
> +            if nominateddeltas:
> +                delta = min(nominateddeltas, key=lambda x: x[1])
> +                break
> +
>          if delta is not None:
>              dist, l, data, base, chainbase, chainlen, compresseddeltalen
> = delta
>          else:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180114/4dc7ef68/attachment.html>