[PATCH 3 of 3] listkeypattern: add listkeypattern wireproto method

Wed Aug 17 08:05:53 EDT 2016

On 08/17/2016 03:39 AM, Gregory Szorc wrote:
> On Tue, Aug 16, 2016 at 5:22 PM, Pierre-Yves David
> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> wrote:
>
>
>
>     On 08/14/2016 07:17 PM, Gregory Szorc wrote:
>
>         On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash at fb.com
>         <mailto:stash at fb.com>
>         <mailto:stash at fb.com <mailto:stash at fb.com>>> wrote:
>
>             # HG changeset patch
>             # User Stanislau Hlebik <stash at fb.com <mailto:stash at fb.com>
>         <mailto:stash at fb.com <mailto:stash at fb.com>>>
>
>             # Date 1470999441 25200
>             #      Fri Aug 12 03:57:21 2016 -0700
>             # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
>             # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
>             listkeypattern: add listkeypattern wireproto method
>
>             wireproto method to list remote keys by pattern
>
>             diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>             --- a/mercurial/wireproto.py
>             +++ b/mercurial/wireproto.py
>             @@ -353,6 +353,22 @@
>                                    % (namespace, len(d)))
>                      yield pushkeymod.decodekeys(d)
>
>             +    @batchable
>             +    def listkeypattern(self, namespace, patterns):
>             +        if not self.capable('pushkey'):
>             +            yield {}, None
>             +        f = future()
>             +        self.ui.debug('preparing listkeys for "%s" with pattern
>             "%s"\n' %
>             +                      (namespace, patterns))
>             +        yield {
>             +            'namespace': encoding.fromlocal(namespace),
>             +            'patterns': encodelist(patterns)
>             +        }, f
>             +        d = f.value
>             +        self.ui.debug('received listkey for "%s": %i bytes\n'
>             +                      % (namespace, len(d)))
>             +        yield pushkeymod.decodekeys(d)
>             +
>                  def stream_out(self):
>                      return self._callstream('stream_out')
>
>             @@ -676,7 +692,8 @@
>                  return repo.opener.tryread('clonebundles.manifest')
>
>              wireprotocaps = ['lookup', 'changegroupsubset',
>         'branchmap', 'pushkey',
>             -                 'known', 'getbundle', 'unbundlehash', 'batch']
>             +                 'known', 'getbundle', 'unbundlehash', 'batch',
>             +                 'listkeypattern']
>
>              def _capabilities(repo, proto):
>                  """return a list of capabilities for a repo
>             @@ -791,6 +808,12 @@
>                  d = repo.listkeys(encoding.tolocal(namespace)).items()
>                  return pushkeymod.encodekeys(d)
>
>             + at wireprotocommand('listkeypattern', 'namespace patterns *')
>
>
>         Why the "*" here? "others" is not used in the function
>         implementation.
>
>
>             +def listkeypattern(repo, proto, namespace, patterns, others):
>             +    patterns = decodelist(patterns)
>             +    d = repo.listkeys(encoding.tolocal(namespace),
>             patterns=patterns).items()
>             +    return pushkeymod.encodekeys(d)
>             +
>              @wireprotocommand('lookup', 'key')
>              def lookup(repo, proto, key):
>                  try:
>
>
>         I think introducing a new wire protocol command is the correct
>         way to
>         solve this problem (as opposed to introducing a new argument on the
>         existing command).
>
>         However, if we're introducing a new wire protocol command for
>         obtaining
>         pushkey values, I think we should improve deficiencies in the
>         response
>         encoding rather than propagate its problems.
>
>         The "listkeys" response encoding can't transmit the full range
>         of binary
>         values. This can lead to larger (and slower) responses sizes. For
>         example, a number of pushkey namespaces exchange lists of nodes.
>         These
>         have to be represented as hex instead of binary. For pushkey
>         namespaces
>         like phases or obsolescence that can exchange hundreds or
>         thousands of
>         nodes, the overhead can add up.
>
>         I think the response from a new listkeys command should be using
>         framing
>         to encode the key names and values so the full range of binary
>         values
>         can be efficiently transferred. We may also want a special
>         mechanism to
>         represent a list of nodes, as avoiding the overhead of framing
>         on fixed
>         width values would be desirable.
>
>         Of course, at the point you introduce a new response encoding,
>         we may
>         want to call the command "listkeys2." If others agree, I can
>         code up an
>         implementation and you can add the patterns functionality on top
>         of it.
>
>
>     Sorry to be a bit late to the discussion.
>
>     I don't think we should introduce a new wire-protocol command for this.
>
>     Individual listkey call have been a large source of race condition
>     and related issue. Bundle2 is able to carry listkey.pushkey call
>     just fine and I think we should prioritize its usage. As bundle2
>     already have framing, we could just use your better encoding with
>     bundle2 directly.
>     However, we should probably push things further and use dedicated
>     part for commonly used request. Having dedicated type will help more
>     semantic reply and request. The series we are commenting on is a
>     good example of that need for "pattern" here pretty much only apply
>     to bookmark, having a dedicated "channel" (within bundle2) for this
>     would make is painless to add any new arguments we could need.
>
>     TL;DR; I don't think we should touch listkey.pushkey. add a bundle2
>     part dedicated to bookmark instead.
>
>
> I have serious objections to stuffing yet more functionality into the
> "getbundle" wire protocol command. That command is quickly becoming a
> "god object" [1]. I'd rather we freeze that wire protocol command and
> offer equivalent functionality under new, better designed commands.

As our wireprotocol is stateless, it is very important that we have 
commands allowing to fetch a consistent set of information in a single 
call. I'm not saying getbundle is the best interface for that and  I'm 
not against introducing a new command, but that command still need to 
allow to fetch many different information in one call. That might even 
requires a rework of the wireprotocol (I've not put too much thinking 
into it) even.

Introducing a new command is a significant step. Again, If someone want 
to do the work to design and implement such things I'm all in favor. In 
the mean time people who want to extend the current feature set should 
go through the existing. bundle2 for data transfer and have at least one 
way to access the data through getbundle.

> In general, I'm opposed to adding arguments to wire protocol commands.
> Introducing an argument requires advertising a server capability
> otherwise clients may send an unknown argument to an incompatible
> server, which the server will reject. At the point you introduce a new
> argument/capability, you've effectively introduced a new variation of
> the wire protocol command. You might as well call it something else so
> the semantics of each wire protocol are well-defined and constant over
> time. If nothing else, this makes the client/server code easier to
> understand. I only need to point out wireproto.getbundle() and
> wireproto.unbundle() for examples of overly complicated code as a result
> of introducing wire protocol command arguments (notably support for
> bundle2).

Because we need a single command to consistently fetch all data in an 
extensible way, we'll need that entry point taking arbitrary argument. 
This is actually part of the large gains of bundle2: (1) fetching 
multiple data race-free (2) easy to extend and alter from extension.

I'm not sure I really grasp you point about new arguments and 
capabilities. It is the client responsability to properly reads server 
capabilities and communicate with the server accordingly. This have been 
working reasonably well so far.

However, I see your point than mixing bundle1 and bundle2 into the same 
getbundle command might have been a mistake. It wasn't too apparent when 
we designed it.

> I'm not strongly opposed to the idea of making bundle2 a generic framing
> protocol. I think we could do better. (I raised a number of concerns
> with bundle2 during its implementation phase. In fact, one of them was
> that bundle2 as a generic format for data in transport and at rest is
> not ideal: there are various aspects you want to optimize for and one
> format cannot exceed at all of them. Unfortunately, not much was done
> because bundle2 was already kinda/sorta deployed in a few environments
> at that time. My fault for taking too long to audit the protocol.) I
> think bundle2 is an OK'ish format for data at rest. For a transport
> protocol, not so much.

I entirely agree with you here. Your feedback on how bundle2 could have 
been better are valid (but late) and I'm certainly open to the idea of 
the bundle3 replacing bundle2. However, until someone actually 
implements that bundle3, bundle2 is the best thing we have on the market 
and we should reuse it as much as possible.

An important aspect of that is that we want a bundle2+ version of any 
data we could fetch in order to ensure we can transport it alongside 
associated data (during pull/push) and we can store it on disk.
Since we must a bundle2+ representation of that data, it does not seems 
very useful to introduce another representation/encoding of it.
(The above paragraph disregard the

> If we do use bundle2 as a container for the wire protocol, that doesn't
> mean we can't have multiple wire protocol commands for exchanging
> bundle2 payloads.

Yep, using bundle2 for transporting command data seems reasonable.

However, there is a argument similar to the one in the previous block to 
be made. The data we want to exchange most certainly needs to be carried 
alongside other in a consistent way. So we at minimum needs a support 
for fetching it through getbundle (or a better command serving the same 
purpose). And once we have a way to obtain the data through gebundle+ 
building a second way to fetch just the very same data does not seems 
very useful.

> [1] https://en.wikipedia.org/wiki/God_object

I think I have been battling god object in Mercurial for long enough to 
know what they are ;-)

Cheers,

-- 
Pierre-Yves David