[PATCH 2 of 5 v4] revlog: add flagprocessor

Rémi Chaintron remi.chaintron at gmail.com
Mon Dec 12 05:00:17 EST 2016


On Sat, 3 Dec 2016 at 02:49 Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

> On 11/29/2016 06:16 PM, Rémi Chaintron wrote:
> > Thanks for the awesome review.
> >
> >     > +
> >     > +    def register(self, transformmap):
> >     > +        """Register transforms to be applied for flags.
> >     > +
> >     > +        ``transformmap`` - a map of flag to transform
> >     > +        """
> >     > +        for flag in transformmap:
> >     > +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> >     > +                self.transformmap[flag] = transformmap[flag]
> >
> >     * unknown flag seems to be silently ignored. It would be better to
> just
> >     fail here.
> >
> >     * we have a mechanism to load extra elements (like flag processors
> are).
> >     Have a look at 'mercurial.dispatch.extraloaders' (or use a registrar
> >     API, but that one is probably overkill here)
> >
> >     * we do not handle collision at all, if multiple extensions try to
> >     register a process for the same flag, we should not let the last one
> >     just overwrite the other one. (Abort is the simplest handling).
> >
> >
> > As part of moving the flagprocessor out, I now have a single
> > 'processflags(self, text, flags, reverse=True)' method in revlog, which
> > I updated to take care of unknown flags (which was previously done in
> > 'revlog.revision()').
> >
> > Collisions were explicitly not handled, as we were until now relying on
> > a flag per extension, but this is something I'm happy to discuss.
>
> We cannot exclude the change for a collision, especially because
> multiple extension could use the same flag without having registered it
> with Mercurial ever. In addition, as discussed in Patch4 it would make
> sense to have multiple extensions implement different variant of the
> same flag/behavior.
>

If I understand correctly, what you would like to see is moving towards a
generic flag (for example REVIDX_ISSTOREDEXTERNALLY), that several
extensions can register transforms for.

* My assumption is that we want only one extension to be able to register
transforms on a specific flag. (cf. last point on challenging this).

* An extension that registered a transform needs to be able to change the
transform it registered in order to cover all workflows. Some code paths
require both a read AND write transforms (e.g 'commit' calls revision()
followed by addrevision() and these need different transforms).

* Notwithstanding the fact any extension could force an update of the
transforms dictionary anyway, I'm not sure how to prevent collisions here.
Wouldn't it make sense to let users be responsible for which extensions
they enable?

* If not, I'm a bit unsure as how to enable extension A to register its
transforms while preventing extension B from doing so. I can think of ways
of how to detect genuine errors (i.e a developer enabling two of these
extensions by mistake) but I'm not sure how to keep the whitelisting of a
single extension going as from the point of view of the  flag transforms
dictionary, extensions are anonymous. Note that this would cover only the
aspect of genuine mistakes, not intentional tempering with the dictionary.

* We could also enable several extensions to register different transforms
on the same flag (which, ultimately we might have to in order to avoid
running out of flag bits) but that would require "de-anonymizing"
extensions from the point of view of REVIDX_FLAGS_TRANSFORMS and keeping an
ordering of such extensions to ensure non-commutative operations are
handled gracefully (relying on a double ordering of which flags handle
first and which extensions apply in which order for each flag).

Could you elaborate on what you'd like to see here?




> > The current updated version I'm working on simply relies on the
> > following constant:
> > REVIDX_FLAGS_TRANSFORMS = collections.OrderedDict({
> >     REVIDX_ISCENSORED: None,
> >     REVIDX_ISLFS: None,
> > })
> > This takes care of both the ordering and a way to register transforms,
> > but this enforces using only one transform per flag.
>
> Not that in the example you provide, You are feeding a dict to
> OrderedDict, so the ordering is lost before reaching the OrderedDict.
>
> In addition, using an ordered dict makes is hard to third party
> extension to add and use any non-declared flag.
>
> >     > +
> >     > +    def unregister(self, transformmap):
> >     > +        """Unregister transforms for flags."""
> >     > +        for flag in transformmap:
> >     > +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> >     > +                self.transformmap[flag] = None
> >
> >     What is the usecase for unregistering? Since we do not call the
> process
> >     if the flag is not set, I can't think of a case were we need to
> >     unregister. Am I missing something?
> >
> >
> > I will get rid of this, as we're moving away from non-flag based
> > transforms (cf. below).
> >
> >
> >     > +    def processflags(self, node, text, revisionflags,
> reverse=False):
> >     > +        """Process flags and apply registered transforms.
> >     > +
> >     > +        ``node`` - the noideid of revision
> >     > +        ``text`` - the revision data to process
> >     > +        ``revisionflags`` - the flags applied to the revision
> >     > +        ``reverse`` - an optional argument describing whether the
> >     flags should
> >     > +          be processed in order according to the
> >     ``flagprocessor`` flag
> >     > +          priority. The flags should be processed in order of
> >     priority when
> >     > +          reading revisions, and in reverse order when writing
> >     revisions.
> >     > +
> >     > +        Returns a 2-tuple of ``(text, validatehash)`` where
> >     ``text`` is the
> >     > +        processed text and ``validatehash`` is a bool indicating
> >     whether the
> >     > +        returned text should be checked for hash integrity.
> >     > +        """
> >     > +        validatehash = True
> >     > +
> >     > +        # Depending on the operation (read or write), the order
> >     might be
> >     > +        # reversed due to non-commutative transforms.
> >     > +        orderedflags = REVIDX_FLAGS_PROCESSING_ORDER
> >     > +        if reverse is True:
> >
> >     You want 'if reverse' here.
> >
> >
> > (Y)
> >
> >
> >
> >     > +            orderedflags = reversed(orderedflags)
> >     > +
> >     > +        for flag in orderedflags:
> >     > +            # If a transform has been registered for a known
> >     flag, apply it and
> >     > +            # update the result tuple.
> >     > +            # If no flag is set but transforms have been
> >     registered, it is
> >     > +            # assumed that the repository does not send flags
> >     over the wire and
> >     > +            # the extensions implement heuristics to determine
> >     what to do based
> >     > +            # on the contents of the filelog.
> >
> >     This special case on no flag set seems very suspicious to me.
> >
> >     * first it seems like there is two different case here (one were we
> just
> >     use the flag and one were we check if we need to install a flag).
> Having
> >     seem sharing the same code path and rely on heuristic to trigger
> >     themselves seems sloppy, we should be more explicit about the case we
> >     are in.
> >
> >     * This part about "does not send flags over the wire" is a bit
> strange
> >     to me, can you elaborate ? We should enforce that peer involved
> support
> >     exchanging the flag before exchanging content where the flag matters.
> >
> >
> > This is something Durham proposed a while ago, and we re-discussed with
> > Ryan McElroy today. The original idea was to allow repositories to not
> > save flags in the backend, by relying on heuristics instead to determine
> > what to do with filelog contents. We however do not like having to rely
> > on heuristics and would rather enforce sending the flags over the wire,
> > to the cost of updating backend dbs schemas to store the flags
> > (defaulting to zero, which would give REVIDX_DEFAULT_FLAGS as a default
> > value) before enabling lfs.
> >
> > We're going to move away from this hybrid design and rely on flags only
> > in the update.
> >
> >     > +            if flag & revisionflags or revisionflags ==
> >     REVIDX_DEFAULT_FLAGS:
> >     > +                f = self.transformmap.get(flag, None)
> >     > +                if f is not None:
> >     > +                    text, validatehash = f(self.revlogobject,
> text)
> >
> >     If I read it correctly, this code is not checking for unknown flag (a
> >     flag set but missing from REVIDX_FLAGS_PROCESSING_ORDER. Not
> processing
> >     a flag seems very serious and we should not let it pass silently.
> >
> >
> > handled correctly in the new design.
> >
> >
> >     > +
> >     > +        return text, validatehash
> >     > +
> >     >  def hash(text, p1, p2):
> >     >      """generate a hash from the given text and its parent hashes
> >     >
> >     > @@ -285,6 +359,7 @@
> >     >          self.version = v
> >     >          self._inline = v & REVLOGNGINLINEDATA
> >     >          self._generaldelta = v & REVLOGGENERALDELTA
> >     > +        self.flagprocessor = flagprocessor(self)
> >
> >     (note: illustration the cycle we are creating)
> >
> >
> > Will update with removed flagprocessor, removing cycling references.
> >
> >
> >
> >     >          flags = v & ~0xFFFF
> >     >          fmt = v & 0xFFFF
> >     >          if fmt == REVLOGV0 and flags:
> >     > @@ -1221,7 +1296,11 @@
> >     >              bins = bins[1:]
> >     >
> >     >          text = mdiff.patches(text, bins)
> >     > -        self.checkhash(text, node, rev=rev)
> >     > +
> >     > +        text, validatehash = self.processflags(node, text,
> >     self.flags(rev))
> >     > +        if validatehash is True:
> >
> >     'if validatehash'
> >
> >
> > (Y)
> >
> >
> >
> >     > +            self.checkhash(text, node, rev=rev)
> >     > +
> >     >          self._cache = (node, rev, text)
> >     >          return text
> >     >
> >     > @@ -1233,6 +1312,28 @@
> >     >          """
> >     >          return hash(text, p1, p2)
> >     >
> >     > +    def processflags(self, node, text, flags, reverse=False):
> >     > +        """Process flags and apply registered transforms.
> >     > +
> >     > +        cf. `flagprocessor.processflags`` method for argument and
> >     return values
> >     > +        description.
> >     > +
> >     > +        ``node`` - the noideid of revision
> >     > +        ``text`` - the revision data to process
> >     > +        ``flags`` - the flags applied to the revision
> >     > +        ``reverse`` - an optional argument describing whether the
> >     flags should
> >     > +          be processed in order according to the
> >     ``flagprocessor`` flag
> >     > +          priority. The flags should be processed in order of
> >     priority when
> >     > +          reading revisions, and in reverse order when writing
> >     revisions.
> >     > +          This allows for non-reflexive operations in extensions.
> >     > +
> >     > +        Returns a 2-tuple of ``(text, validatehash)`` where
> >     ``text`` is the
> >     > +        processed text and ``validatehash`` is a bool indicating
> >     whether the
> >     > +        returned text should be checked for hash integrity.
> >     > +        """
> >     > +        return self.flagprocessor.processflags(node, text, flags,
> >     > +                                               reverse=reverse)
> >
> >     That function seems to be a plain forward to flagprocessors. Why not
> >     calling the flag processors method directly
> >
> >     (well as pointed above, I think we should entirely skip the object)
> >
> >
> > (Y)
> >
> >
> > --
> > Rémi
> > --
> > Rémi
>
> --
> Pierre-Yves David
>
-- 
Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20161212/e6593219/attachment.html>


More information about the Mercurial-devel mailing list