[PATCH 01 of 10 V2] util: create new abstraction for compression engines
Augie Fackler
raf at durin42.com
Wed Nov 9 12:32:13 EST 2016
On Mon, Nov 07, 2016 at 07:13:49PM -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1478572299 28800
> # Mon Nov 07 18:31:39 2016 -0800
> # Node ID f3c9da54ff5e23becaa4d0e90a20c9de704a70ba
> # Parent 0911191dc4c97cbc8334c8b83782e8134bf621f0
> util: create new abstraction for compression engines
Can this code live in (say) "compression.py"? util is already shaggy,
and I'd rather not make it more of a dumping ground.
>
> Currently, util.py has "compressors" and "decompressors" dicts
> mapping compression algorithms to callables returning objects that
> perform well-defined operations. In addition, revlog.py has code
> for calling into a compressor or decompressor explicitly. And, there
> is code in the wire protocol for performing zlib compression.
>
> The 3rd party lz4revlog extension has demonstrated the utility of
> supporting alternative compression formats for revlog storage. But
> it stops short of supporting lz4 for bundles and the wire protocol.
>
> There are also plans to support zstd as a general compression
> replacement.
>
> So, there appears to be a market for a unified API for registering
> compression engines. This commit starts the process of establishing
> one.
>
> This commit establishes a base class/interface for defining
> compression engines and how they will be used. A collection class
> to hold references to registered compression engines has also been
> introduced.
>
> The built-in zlib, bz2, truncated bz2, and no-op compression engines
> are registered with a singleton instance of the collection class.
>
> The compression engine API will change once consumers are ported
> to the new API and some common patterns can be simplified at the
> engine API level. So don't get too attached to the API...
>
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -2856,13 +2856,219 @@ class ctxmanager(object):
> raise exc_val
> return received and suppressed
>
> -# compression utility
> +# compression code
> +
> +class compressormanager(object):
> + """Holds registrations of various compression engines.
> +
> + This class essentially abstracts the differences between compression
> + engines to allow new compression formats to be added easily, possibly from
> + extensions.
> +
> + Compressors are registered against the global instance by calling its
> + ``register()`` method.
> + """
> + def __init__(self):
> + self._engines = {}
> + # Bundle spec human name to engine name.
> + self._bundlenames = {}
> + # Internal bundle identifier to engine name.
> + self._bundletypes = {}
> +
> + def __getitem__(self, key):
> + return self._engines[key]
> +
> + def __contains__(self, key):
> + return key in self._engines
> +
> + def __iter__(self):
> + return iter(self._engines.keys())
> +
> + def register(self, engine):
> + """Register a compression engine with the manager.
> +
> + The argument must be a ``compressionengine`` instance.
> + """
> + if not isinstance(engine, compressionengine):
> + raise ValueError(_('argument must be a compressionengine'))
> +
> + name = engine.name()
> +
> + if name in self._engines:
> + raise error.Abort(_('compression engine %s already registered') %
> + name)
> +
> + bundleinfo = engine.bundletype()
> + if bundleinfo:
> + bundlename, bundletype = bundleinfo
> +
> + if bundlename in self._bundlenames:
> + raise error.Abort(_('bundle name %s already registered') %
> + bundlename)
> + if bundletype in self._bundletypes:
> + raise error.Abort(_('bundle type %s already registered by %s') %
> + (bundletype, self._bundletypes[bundletype]))
> +
> + # No external facing name declared.
> + if bundlename:
> + self._bundlenames[bundlename] = name
> +
> + self._bundletypes[bundletype] = name
> +
> + self._engines[name] = engine
> +
> + @property
> + def supportedbundlenames(self):
> + return set(self._bundlenames.keys())
> +
> + @property
> + def supportedbundletypes(self):
> + return set(self._bundletypes.keys())
> +
> + def forbundlename(self, bundlename):
> + """Obtain a compression engine registered to a bundle name.
> +
> + Will raise KeyError if the bundle type isn't registered.
> + """
> + return self._engines[self._bundlenames[bundlename]]
> +
> + def forbundletype(self, bundletype):
> + """Obtain a compression engine registered to a bundle type.
> +
> + Will raise KeyError if the bundle type isn't registered.
> + """
> + return self._engines[self._bundletypes[bundletype]]
> +
> +compengines = compressormanager()
> +
> +class compressionengine(object):
> + """Base class for compression engines.
> +
> + Compression engines must implement the interface defined by this class.
> + """
> + def name(self):
> + """Returns the name of the compression engine.
> +
> + This is the key the engine is registered under.
> +
> + This method must be implemented.
> + """
> + raise NotImplementedError()
> +
> + def bundletype(self):
> + """Describes bundle identifiers for this engine.
> +
> + If this compression engine isn't supported for bundles, returns None.
> +
> + If this engine can be used for bundles, returns a 2-tuple of strings of
> + the user-facing "bundle spec" compression name and an internal
> + identifier used to denote the compression format within bundles. To
> + exclude the name from external usage, set the first element to ``None``.
> +
> + If bundle compression is supported, the class must also implement
> + ``compressorobj`` and `decompressorreader``.
> + """
> + return None
> +
> + def compressorobj(self):
> + """(Temporary) Obtain an object used for compression.
> +
> + The returned object has ``compress(data)`` and ``flush()`` methods.
> + These are used to incrementally feed data chunks into a compressor.
> + """
> + raise NotImplementedError()
> +
> + def decompressorreader(self, fh):
> + """Perform decompression on a file object.
> +
> + Argument is an object with a ``read(size)`` method that returns
> + compressed data. Return value is an object with a ``read(size)`` that
> + returns uncompressed data.
> + """
> + raise NotImplementedError()
> +
> +class _zlibengine(compressionengine):
> + def name(self):
> + return 'zlib'
> +
> + def bundletype(self):
> + return 'gzip', 'GZ'
> +
> + def compressorobj(self):
> + return zlib.compressobj()
> +
> + def decompressorreader(self, fh):
> + def gen():
> + d = zlib.decompressobj()
> + for chunk in filechunkiter(fh):
> + yield d.decompress(chunk)
> +
> + return chunkbuffer(gen())
Squinting at this, could compressionengine be a namedtuple that has
four fields? nothing ever uses self in these, so I think compressorobj
and decompressorreader could both be free functions that get tucked
into a namedtuple.
[elided rest of file that is routine]
More information about the Mercurial-devel
mailing list