[PATCH 1 of 2 "V3] compression: introduce an official `zstd-revlog` requirement

Gregory Szorc gregory.szorc at gmail.com
Tue Apr 16 06:28:55 EDT 2019


On Mon, Apr 15, 2019 at 8:34 AM Augie Fackler <raf at durin42.com> wrote:

> Greg: ping on these? I'm aware you had some unease on this topic, and
> I'd like to figure out if we should ship this. Thanks!
>

I've been enjoying my holiday away from the computer. I met with
Pierre-Yves and Georges yesterday and agreed to review these before the
freeze...


>
> On Tue, Apr 09, 2019 at 12:06:00PM +0200, Pierre-Yves David wrote:
> > # HG changeset patch
> > # User Pierre-Yves David <pierre-yves.david at octobus.net>
> > # Date 1553707623 -3600
> > #      Wed Mar 27 18:27:03 2019 +0100
> > # Node ID 8ce788576265eb8b84cb697ea2e09b2246fb7b81
> > # Parent  456c37433c433ee59d321315da24eb944e495f08
> > # EXP-Topic zstd-revlog
> > # Available At https://bitbucket.org/octobus/mercurial-devel/
> > #              hg pull https://bitbucket.org/octobus/mercurial-devel/
> -r 8ce788576265
> > compression: introduce an official `zstd-revlog` requirement
> >
> > This requirement supersedes `exp-compression-zstd`. However, we keep
> support for
> > the old requirement.
> >
> > Strictly speaking, we do not need to add a new requirement, there are no
> logic
> > change making "new" repo incompatible with mercurial client using a
> version
> > that support `exp-compression-zstd`.
> >
> > The choice to introduce a new requirement is motivated by the following:
> >
> > * The previous requirement was explicitly "experimental". Using it by
> default
> >   could confuse users.
> >
> > * adding support for a hypothetical third compression engine will
> requires new
> >   code, and will comes with its own requirement tool.
> >
> > * We won't use it as the default for a while since I do not think we
> support
> >   zstd on all platform. I can imagine we'll gain another (unrelated but
> on my
> >   default) requirement by the time we turn this zstd by default.
>

So, the reason why I named things "compression-<format>" was so we could
have a per-compressor requirement that could potentially allow *anything*
to use that compressor. For example, "compression-zstd" could allow random
on-disk files to be compressed with zstd.

Of course, *any* time you make a BC change to how the repository is read
from the filesystem, you need a new requirement, otherwise old clients may
attempt to open the repo and get confused or cause corruption. So even if
we have an e.g. "compression-zstd" requirement, we would still need
additional requirements to denote when zstd support is added to individual
components. e.g. we'd need a dedicated requirement for "zstd in revlogs."

I think I'm OK with the approach in this patch.



> >
> > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> > --- a/mercurial/localrepo.py
> > +++ b/mercurial/localrepo.py
> > @@ -645,6 +645,8 @@ def gathersupportedrequirements(ui):
> >          engine = util.compengines[name]
> >          if engine.revlogheader():
> >              supported.add(b'exp-compression-%s' % name)
> > +            if engine.name() == 'zstd':
> > +                supported.add(b'revlog-compression-zstd')
>

I'm still not a fan of special casing zstd: there should be something on
the compression engine API doing this. But I'm willing to look the other
way for now because doing it as part of the compression engine API means
stabilizing the "exp-compression-<format>" bit and I don't want to do that
just yet.

There is also an existing bug where we fail to check engine.available().
This will result in Mercurial thinking it can open zstd repositories and
then failing when it tries to use zstd when it isn't available.
(Compression engines are always present/registered but they may not be
available at run-time due to missing dependencies.)

Could you please send a patch (either standalone or a v4 of this one) that
adds a check for engine.available() and skips engines that aren't usable?


> >
> >      return supported
> >
> > @@ -794,8 +796,13 @@ def resolverevlogstorevfsoptions(ui, req
> >          options[b'maxchainlen'] = maxchainlen
> >
> >      for r in requirements:
> > -        if r.startswith(b'exp-compression-'):
> > -            options[b'compengine'] = r[len(b'exp-compression-'):]
> > +        # we allow multiple compression engine requirement to co-exist
> because
> > +        # strickly speaking, revlog seems to support mixed compression
> style.
> > +        #
> > +        # The compression used for new entries will be "the last one"
> > +        prefix = r.startswith
> > +        if prefix('revlog-compression-') or prefix('exp-compression-'):
> > +            options[b'compengine'] = r.split('-', 2)[2]
>

An existing problem with this code is that `requirements` is a set and
iteration order will be undefined. So if we have multiple compressors in
play, we could inconsistently compress with different compressors for
different transactions! We should have some way to sort the priorities. But
this is scope bloat and an existing issue, so I'm inclined to ignore it for
now. An inline comment would be useful, however.


> >
> >      options[b'zlib.level'] = ui.configint(b'storage',
> b'revlog.zlib.level')
> >      if options[b'zlib.level'] is not None:
> > @@ -2929,7 +2936,9 @@ def newreporequirements(ui, createopts):
> >                                   'compression engines'))
> >
> >      # zlib is the historical default and doesn't need an explicit
> requirement.
> > -    if compengine != 'zlib':
> > +    elif compengine == 'zstd':
> > +        requirements.add('revlog-compression-zstd')
> > +    elif compengine != 'zlib':
> >          requirements.add('exp-compression-%s' % compengine)
> >
> >      if scmutil.gdinitconfig(ui):
> > diff --git a/mercurial/upgrade.py b/mercurial/upgrade.py
> > --- a/mercurial/upgrade.py
> > +++ b/mercurial/upgrade.py
> > @@ -325,10 +325,16 @@ class compressionengine(formatvariant):
> >
> >      @classmethod
> >      def fromrepo(cls, repo):
> > +        # we allow multiple compression engine requirement to co-exist
> because
> > +        # strickly speaking, revlog seems to support mixed compression
> style.
> > +        #
> > +        # The compression used for new entries will be "the last one"
> > +        compression = 'zlib'
> >          for req in repo.requirements:
> > -            if req.startswith('exp-compression-'):
> > -                return req.split('-', 2)[2]
> > -        return 'zlib'
> > +            prefix = req.startswith
> > +            if prefix('revlog-compression-') or
> prefix('exp-compression-'):
> > +                compression = req.split('-', 2)[2]
> > +        return compression
> >
> >      @classmethod
> >      def fromconfig(cls, repo):
> > diff --git a/tests/test-repo-compengines.t
> b/tests/test-repo-compengines.t
> > --- a/tests/test-repo-compengines.t
> > +++ b/tests/test-repo-compengines.t
> > @@ -44,9 +44,9 @@ A requirement specifying an unknown comp
> >    $ cd zstd
> >    $ cat .hg/requires
> >    dotencode
> > -  exp-compression-zstd
> >    fncache
> >    generaldelta
> > +  revlog-compression-zstd
> >    revlogv1
> >    sparserevlog
> >    store
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20190416/9978fc2e/attachment.html>


More information about the Mercurial-devel mailing list