[PATCH 01 of 19] revlog: drop duplicated code

Gregory Szorc gregory.szorc at gmail.com
Mon Sep 10 12:52:27 EDT 2018


On Sat, Sep 8, 2018 at 3:57 AM Boris Feld <boris.feld at octobus.net> wrote:

> # HG changeset patch
> # User Boris Feld <boris.feld at octobus.net>
> # Date 1536087921 -7200
> #      Tue Sep 04 21:05:21 2018 +0200
> # Node ID bdcfd96d87254a508e85a6eb502ffe4c57845bc8
> # Parent  6268fed317d04dd8f0430468818ea5d0529b6503
> # EXP-Topic sparse-snapshot
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> bdcfd96d8725
> revlog: drop duplicated code
>

Queued this series.

Thank you for the detailed explanations in the commit messages and the
inline comments. It really helps to grok how all of this works.

I find this work around sparse revlogs and more intelligent delta chains
*very* interesting. And the results obviously speak for themselves.

It's worth noting that when we have alternate storage backends that may not
be based on append-only storage and linear access models, more complex
delta strategies like this make a *lot* more sense and assuming the
performance impact is reasonable, should obviously be the preferred
mechanism.

I'm also a fan of limiting the max chain length: setting an upper bound on
the chain length starts to set an upper bound on the maximum work that
needs to be performed at read/write time. While not addressed by this
series, another potential area for improvement is setting a bounded upper
limit on the total size of stored deltas: since decompression and patch
application is somewhat linearly proportional to the input data size,
bounding the total size of data that needs to be decoded in order to
resolve a revision fulltext would potentially be a *better* bounding
mechanism than chain length itself. I'll be honest, I'm not sure what the
heuristic would be to determine what that bounded ceiling should be. IMO it
is worth investigating though.

Regarding performance, there are a handful of quadratic algorithms in this
code. We have several calls to deltaparent() both directly and indirectly.
I feel like this will add significant overhead when adding multiple
revisions to large revlogs (>100,000 revisions). What I'm not sure about is
how this overhead compares to the overhead of resolving parent revisions
and computing deltas against them. I expect revision resolution and delta
computation to take most of the CPU. But at the same time, quadratic
algorithms coupled with Python overhead can make things very slow. I look
forward to the follow-up series addressing performance.


>
> This code probably got duplicated by a rebase/evolve conflict. We drop the
> extra copy, the other copy is right below.
>
> This had no real effects since other logic ensure that we never test the
> same
> revision twice.
>
> diff --git a/mercurial/revlogutils/deltas.py
> b/mercurial/revlogutils/deltas.py
> --- a/mercurial/revlogutils/deltas.py
> +++ b/mercurial/revlogutils/deltas.py
> @@ -621,19 +621,6 @@ def _rawgroups(revlog, p1, p2, cachedelt
>      curr = len(revlog)
>      prev = curr - 1
>
> -    # should we try to build a delta?
> -    if prev != nullrev and revlog._storedeltachains:
> -        tested = set()
> -        # This condition is true most of the time when processing
> -        # changegroup data into a generaldelta repo. The only time it
> -        # isn't true is if this is the first revision in a delta chain
> -        # or if ``format.generaldelta=true`` disabled ``lazydeltabase``.
> -        if cachedelta and gdelta and revlog._lazydeltabase:
> -            # Assume what we received from the server is a good choice
> -            # build delta will reuse the cache
> -            yield (cachedelta[0],)
> -            tested.add(cachedelta[0])
> -
>      # This condition is true most of the time when processing
>      # changegroup data into a generaldelta repo. The only time it
>      # isn't true is if this is the first revision in a delta chain
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180910/8afe389a/attachment.html>


More information about the Mercurial-devel mailing list