[PATCH 2 of 5 v4] revlog: add flagprocessor

Rémi Chaintron remi.chaintron at gmail.com
Tue Nov 29 12:16:30 EST 2016


Thanks for the awesome review.
I included the changes in my current version and will update the stack once
I'm done with all required changes.

The approach I'm currently following relies on getting rid of the
flagprocessor object, instead relying on a single revlog.processflags()
method and an OrderedDict containing the flags and their associated
transforms.
The hybrid "no flags" design is going away too, which makes the code really
simpler.

On Tue, 29 Nov 2016 at 06:59 Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>  # max size of revlog with inline data
>  _maxinline = 131072
> @@ -76,6 +80,76 @@
>
>  _nullhash = hashlib.sha1(nullid)
>
> +class flagprocessor(object):
> +    """process revlog objects contents based on flags
> +
> +    flagprocessor objects are the interface between revlog objects and
> +    subclasses/extensions needing to change or update the contents of
revlog
> +    objects, based on flags.
> +
> +    The role of the flagprocessor is to apply transforms registered by
> +    extensions and revlog subclasses in the order defined by
> +    REVIDX_FLAGS_PROCESSING_ORDER and the type (read or write) of
operation.
> +    This allows the flag processor to modify the contents of revlog
objects and
> +    handle the composition of non-commutative operations.
> +    """
> +
> +    def __init__(self, revlogobject):
> +        self.transformmap = {}
> +        self.revlogobject = revlogobject

I'm not sure why we need a full object here, it seems like we just need
a dictionary mapping 'flag' to 'processors' and a function running them.
In the Mercurial code base we try to avoid creating too much new class.
It seems like we could do without the class here (unless I'm missing
something).

In addition to this local custom:

* an individual flagprocessor object is created for each revlog. I'm not
sure why? Can't we just use one mapping (as we do for ordering?)

* having a the revlog referencing the flagprocessor which itself
reference the revlog create a reference cycle. We avoid creating
reference cycle in Mercurial (both for design purpose and because python
is not great with them)


Good point, I have now removed the flagprocessor altogether, which looks
much nicer!


> +
> +    def register(self, transformmap):
> +        """Register transforms to be applied for flags.
> +
> +        ``transformmap`` - a map of flag to transform
> +        """
> +        for flag in transformmap:
> +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> +                self.transformmap[flag] = transformmap[flag]

* unknown flag seems to be silently ignored. It would be better to just
fail here.

* we have a mechanism to load extra elements (like flag processors are).
Have a look at 'mercurial.dispatch.extraloaders' (or use a registrar
API, but that one is probably overkill here)

* we do not handle collision at all, if multiple extensions try to
register a process for the same flag, we should not let the last one
just overwrite the other one. (Abort is the simplest handling).


As part of moving the flagprocessor out, I now have a single
'processflags(self, text, flags, reverse=True)' method in revlog, which I
updated to take care of unknown flags (which was previously done in
'revlog.revision()').

Collisions were explicitly not handled, as we were until now relying on a
flag per extension, but this is something I'm happy to discuss.

The current updated version I'm working on simply relies on the following
constant:
REVIDX_FLAGS_TRANSFORMS = collections.OrderedDict({
    REVIDX_ISCENSORED: None,
    REVIDX_ISLFS: None,
})
This takes care of both the ordering and a way to register transforms, but
this enforces using only one transform per flag.




> +
> +    def unregister(self, transformmap):
> +        """Unregister transforms for flags."""
> +        for flag in transformmap:
> +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> +                self.transformmap[flag] = None

What is the usecase for unregistering? Since we do not call the process
if the flag is not set, I can't think of a case were we need to
unregister. Am I missing something?


I will get rid of this, as we're moving away from non-flag based transforms
(cf. below).


> +    def processflags(self, node, text, revisionflags, reverse=False):
> +        """Process flags and apply registered transforms.
> +
> +        ``node`` - the noideid of revision
> +        ``text`` - the revision data to process
> +        ``revisionflags`` - the flags applied to the revision
> +        ``reverse`` - an optional argument describing whether the flags
should
> +          be processed in order according to the ``flagprocessor`` flag
> +          priority. The flags should be processed in order of priority
when
> +          reading revisions, and in reverse order when writing revisions.
> +
> +        Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is
the
> +        processed text and ``validatehash`` is a bool indicating whether
the
> +        returned text should be checked for hash integrity.
> +        """
> +        validatehash = True
> +
> +        # Depending on the operation (read or write), the order might be
> +        # reversed due to non-commutative transforms.
> +        orderedflags = REVIDX_FLAGS_PROCESSING_ORDER
> +        if reverse is True:

You want 'if reverse' here.


(Y)



> +            orderedflags = reversed(orderedflags)
> +
> +        for flag in orderedflags:
> +            # If a transform has been registered for a known flag, apply
it and
> +            # update the result tuple.
> +            # If no flag is set but transforms have been registered, it
is
> +            # assumed that the repository does not send flags over the
wire and
> +            # the extensions implement heuristics to determine what to
do based
> +            # on the contents of the filelog.

This special case on no flag set seems very suspicious to me.

* first it seems like there is two different case here (one were we just
use the flag and one were we check if we need to install a flag). Having
seem sharing the same code path and rely on heuristic to trigger
themselves seems sloppy, we should be more explicit about the case we
are in.

* This part about "does not send flags over the wire" is a bit strange
to me, can you elaborate ? We should enforce that peer involved support
exchanging the flag before exchanging content where the flag matters.


This is something Durham proposed a while ago, and we re-discussed with
Ryan McElroy today. The original idea was to allow repositories to not save
flags in the backend, by relying on heuristics instead to determine what to
do with filelog contents. We however do not like having to rely on
heuristics and would rather enforce sending the flags over the wire, to the
cost of updating backend dbs schemas to store the flags (defaulting to
zero, which would give REVIDX_DEFAULT_FLAGS as a default value) before
enabling lfs.

We're going to move away from this hybrid design and rely on flags only in
the update.

> +            if flag & revisionflags or revisionflags ==
REVIDX_DEFAULT_FLAGS:
> +                f = self.transformmap.get(flag, None)
> +                if f is not None:
> +                    text, validatehash = f(self.revlogobject, text)

If I read it correctly, this code is not checking for unknown flag (a
flag set but missing from REVIDX_FLAGS_PROCESSING_ORDER. Not processing
a flag seems very serious and we should not let it pass silently.


handled correctly in the new design.


> +
> +        return text, validatehash
> +
>  def hash(text, p1, p2):
>      """generate a hash from the given text and its parent hashes
>
> @@ -285,6 +359,7 @@
>          self.version = v
>          self._inline = v & REVLOGNGINLINEDATA
>          self._generaldelta = v & REVLOGGENERALDELTA
> +        self.flagprocessor = flagprocessor(self)

(note: illustration the cycle we are creating)


Will update with removed flagprocessor, removing cycling references.



>          flags = v & ~0xFFFF
>          fmt = v & 0xFFFF
>          if fmt == REVLOGV0 and flags:
> @@ -1221,7 +1296,11 @@
>              bins = bins[1:]
>
>          text = mdiff.patches(text, bins)
> -        self.checkhash(text, node, rev=rev)
> +
> +        text, validatehash = self.processflags(node, text,
self.flags(rev))
> +        if validatehash is True:

'if validatehash'


(Y)



> +            self.checkhash(text, node, rev=rev)
> +
>          self._cache = (node, rev, text)
>          return text
>
> @@ -1233,6 +1312,28 @@
>          """
>          return hash(text, p1, p2)
>
> +    def processflags(self, node, text, flags, reverse=False):
> +        """Process flags and apply registered transforms.
> +
> +        cf. `flagprocessor.processflags`` method for argument and return
values
> +        description.
> +
> +        ``node`` - the noideid of revision
> +        ``text`` - the revision data to process
> +        ``flags`` - the flags applied to the revision
> +        ``reverse`` - an optional argument describing whether the flags
should
> +          be processed in order according to the ``flagprocessor`` flag
> +          priority. The flags should be processed in order of priority
when
> +          reading revisions, and in reverse order when writing revisions.
> +          This allows for non-reflexive operations in extensions.
> +
> +        Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is
the
> +        processed text and ``validatehash`` is a bool indicating whether
the
> +        returned text should be checked for hash integrity.
> +        """
> +        return self.flagprocessor.processflags(node, text, flags,
> +                                               reverse=reverse)

That function seems to be a plain forward to flagprocessors. Why not
calling the flag processors method directly

(well as pointed above, I think we should entirely skip the object)


(Y)


-- 
Rémi
-- 
Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20161129/f959aeef/attachment.html>


More information about the Mercurial-devel mailing list