[PATCH 01 of 10] revlog: add clone method

Mon Nov 21 21:01:21 EST 2016

On 11/21/2016 10:14 PM, Gregory Szorc wrote:
> On Mon, Nov 21, 2016 at 12:37 PM, Augie Fackler <raf at durin42.com
> <mailto:raf at durin42.com>> wrote:
>
>     On Sat, Nov 05, 2016 at 09:40:17PM -0700, Gregory Szorc wrote:
>     > # HG changeset patch
>     > # User Gregory Szorc <gregory.szorc at gmail.com <mailto:gregory.szorc at gmail.com>>
>     > # Date 1478405835 25200
>     > #      Sat Nov 05 21:17:15 2016 -0700
>     > # Node ID ebbd8d975e4bf59b2bdd44736fdf13222988d1a4
>     > # Parent  f01367faa792635ad2f7a6b175ae3252292b5121
>     > revlog: add clone method
>     >
>     > Upcoming patches will introduce functionality for in-place
>     > repository/store "upgrades." Copying the contents of a revlog
>     > feels sufficiently low-level to warrant being in the revlog
>     > class.
>     >
>     > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
>     > --- a/mercurial/revlog.py
>     > +++ b/mercurial/revlog.py
>     > @@ -1818,3 +1818,32 @@ class revlog(object):
>     >          if not self._inline:
>     >              res.append(self.datafile)
>     >          return res
>     > +
>     > +    def clone(self, tr, destrevlog, addrevisioncb=None):
>     > +        """Copy the contents of this revlog to another revlog.
>     > +
>     > +        The destination revlog will contain the same revisions and nodes.
>
>     It might be worth calling out that this copying is done via repeated
>     calls to addrevision, so delta recomputation will take place. Which is
>     to say this is not a cheap clone, but rather a very expensive clone,
>     as it's doing a full decompress-undelta-delta-compress cycle.
>
>
> In my unpublished v2 of this series, I've added an argument to clone()
> that controls delta reuse. I believe I also changed the default to feed
> the cached delta into addrevision(), which will prevent the full delta
> cycle if the delta has the revisions that would be chosen by
> addrevision(). I believe this is a reasonable default, as the only
> benefit to a full delta cycle is running bdiff again and hoping to get a
> better result. And we don't need that unless we significantly change
> bdiff's behavior. Even then, it is quite cost prohibitive, so I'm fine
> hiding that behind a --slow argument or some such.

+1 for this unpublished change. Being able to "cheaply" upgrade a 
repository to general delta and benefit from it moving forward would 
already be a massive plus. Having a way to do the slow upgrade is useful 
but can come later.

>     > +        However, it may not be bit-for-bit identical due to e.g. delta encoding
>     > +        differences.
>     > +        """
>     > +        if len(destrevlog):
>     > +            raise ValueError(_('destination revlog is not empty'))
>     > +
>     > +        index = self.index
>     > +        for rev in self:
>     > +            entry = index[rev]
>     > +
>     > +            # Some classes override linkrev to take filtered revs into
>     > +            # account. Use raw entry from index.
>     > +            linkrev = entry[4]
>     > +            p1 = index[entry[5]][7]
>     > +            p2 = index[entry[6]][7]
>     > +            node = entry[7]
>     > +            # FUTURE we could optionally allow reusing the delta to avoid
>     > +            # expensive recomputation.
>     > +            text = self.revision(rev)
>     > +
>     > +            destrevlog.addrevision(text, tr, linkrev, p1, p2, node=node)
>     > +
>     > +            if addrevisioncb:
>     > +                addrevisioncb(self, rev, node)

nits: I prefer explicit test for None. This othewise always eventually 
come back to bites you at some point.

I assume the callback is here to allow progress report. It is probably 
worth mentioning it in the patch description.

-- 
Pierre-Yves David