[PATCH 3 of 3] listkeypattern: add listkeypattern wireproto method

Mon Aug 22 22:34:59 EDT 2016

On Wed, Aug 17, 2016 at 1:08 PM, Durham Goode <durham at fb.com> wrote:

> On 8/16/16 5:22 PM, Pierre-Yves David wrote:
>
>>
>>
>> On 08/14/2016 07:17 PM, Gregory Szorc wrote:
>>
>>> On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash at fb.com
>>> <mailto:stash at fb.com>> wrote:
>>>
>>>     # HG changeset patch
>>>     # User Stanislau Hlebik <stash at fb.com <mailto:stash at fb.com>>
>>>     # Date 1470999441 25200
>>>     #      Fri Aug 12 03:57:21 2016 -0700
>>>     # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
>>>     # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
>>>     listkeypattern: add listkeypattern wireproto method
>>>
>>>     wireproto method to list remote keys by pattern
>>>
>>>     diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>>>     --- a/mercurial/wireproto.py
>>>     +++ b/mercurial/wireproto.py
>>>     @@ -353,6 +353,22 @@
>>>                            % (namespace, len(d)))
>>>              yield pushkeymod.decodekeys(d)
>>>
>>>     +    @batchable
>>>     +    def listkeypattern(self, namespace, patterns):
>>>     +        if not self.capable('pushkey'):
>>>     +            yield {}, None
>>>     +        f = future()
>>>     +        self.ui.debug('preparing listkeys for "%s" with pattern
>>>     "%s"\n' %
>>>     +                      (namespace, patterns))
>>>     +        yield {
>>>     +            'namespace': encoding.fromlocal(namespace),
>>>     +            'patterns': encodelist(patterns)
>>>     +        }, f
>>>     +        d = f.value
>>>     +        self.ui.debug('received listkey for "%s": %i bytes\n'
>>>     +                      % (namespace, len(d)))
>>>     +        yield pushkeymod.decodekeys(d)
>>>     +
>>>          def stream_out(self):
>>>              return self._callstream('stream_out')
>>>
>>>     @@ -676,7 +692,8 @@
>>>          return repo.opener.tryread('clonebundles.manifest')
>>>
>>>      wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap',
>>> 'pushkey',
>>>     -                 'known', 'getbundle', 'unbundlehash', 'batch']
>>>     +                 'known', 'getbundle', 'unbundlehash', 'batch',
>>>     +                 'listkeypattern']
>>>
>>>      def _capabilities(repo, proto):
>>>          """return a list of capabilities for a repo
>>>     @@ -791,6 +808,12 @@
>>>          d = repo.listkeys(encoding.tolocal(namespace)).items()
>>>          return pushkeymod.encodekeys(d)
>>>
>>>     + at wireprotocommand('listkeypattern', 'namespace patterns *')
>>>
>>>
>>> Why the "*" here? "others" is not used in the function implementation.
>>>
>>>
>>>     +def listkeypattern(repo, proto, namespace, patterns, others):
>>>     +    patterns = decodelist(patterns)
>>>     +    d = repo.listkeys(encoding.tolocal(namespace),
>>>     patterns=patterns).items()
>>>     +    return pushkeymod.encodekeys(d)
>>>     +
>>>      @wireprotocommand('lookup', 'key')
>>>      def lookup(repo, proto, key):
>>>          try:
>>>
>>>
>>> I think introducing a new wire protocol command is the correct way to
>>> solve this problem (as opposed to introducing a new argument on the
>>> existing command).
>>>
>>> However, if we're introducing a new wire protocol command for obtaining
>>> pushkey values, I think we should improve deficiencies in the response
>>> encoding rather than propagate its problems.
>>>
>>> The "listkeys" response encoding can't transmit the full range of binary
>>> values. This can lead to larger (and slower) responses sizes. For
>>> example, a number of pushkey namespaces exchange lists of nodes. These
>>> have to be represented as hex instead of binary. For pushkey namespaces
>>> like phases or obsolescence that can exchange hundreds or thousands of
>>> nodes, the overhead can add up.
>>>
>>> I think the response from a new listkeys command should be using framing
>>> to encode the key names and values so the full range of binary values
>>> can be efficiently transferred. We may also want a special mechanism to
>>> represent a list of nodes, as avoiding the overhead of framing on fixed
>>> width values would be desirable.
>>>
>>> Of course, at the point you introduce a new response encoding, we may
>>> want to call the command "listkeys2." If others agree, I can code up an
>>> implementation and you can add the patterns functionality on top of it.
>>>
>>
>> Sorry to be a bit late to the discussion.
>>
>> I don't think we should introduce a new wire-protocol command for this.
>>
>> Individual listkey call have been a large source of race condition and
>> related issue. Bundle2 is able to carry listkey.pushkey call just fine and
>> I think we should prioritize its usage. As bundle2 already have framing, we
>> could just use your better encoding with bundle2 directly.
>> However, we should probably push things further and use dedicated part
>> for commonly used request. Having dedicated type will help more semantic
>> reply and request. The series we are commenting on is a good example of
>> that need for "pattern" here pretty much only apply to bookmark, having a
>> dedicated "channel" (within bundle2) for this would make is painless to add
>> any new arguments we could need.
>>
> If we create a bundle2 part for transmitting bookmark information, what
> does the request flow look like?  getbundle seems to take kwargs, so would
> all requests piggy back their query parameters into the getbundle kwargs?
> Like passing "cg=False, bookmarks=True, bookmark_pattern=foo/*" as
> arguments that the receiver would know to produce a bookmarkpart that
> matches that parameter (and know not to send a changegroup)?  Seems like a
> recipe for problems if every requestable part type overloads the same
> kwargs.
>
> Is the alternative to use the unbundle protocol to send empty parts with
> parameters as the way of requesting data?  Then the response is sent as a
> reply part?  Would that then eventually deprecate getbundle entirely?
>
> I'm just trying to understand how we might implement the listkeypattern
> functionality using bundle2 without having to refactor bundle2 much.
>

I don't think overloading "unbundle" to retrieve data is a good idea. The
support for "unbundle" returning a bundle2 is as far as I remember, a
feature added by Facebook to support the pushrebase extension and there are
no consumers of the feature in core. Either way, I think preserving the
semantic difference between "getbundle" being used for pull and "unbundle"
being used for push is important. I could see us accidentally doing things
like obtaining a lock while processing a pull-originated "unbundle" and
that's not something we'd want to happen.

To create a bundle2 for transmitting bookmark information, the server will
need a new capacity that basically says "I support receiving getbundle
arguments related to bookmarks." The client will need to advertise bundle2
capabilities saying it knows how to receive cert bookmarks related bundle2
parts. Then you teach exchange._pullbundle2 to send said arguments. Then a
@getbundle2partsgenerator in exchange.py takes care of adding the part on
the server if it was requested and a @parthandler in bundle2.py takes care
of doing something with it on the client. Look at d29201352af7 and
0c2ded041d10 for part of this. The only thing that doesn't do is advertise
the server capability (because we didn't need to pass arguments to
"getbundle").

And yes, having a shared namespace of arguments for every requestable part
type could cause chaos. It means that any extension adding an argument
could potentially conflict with a future core-provided argument. This is
one reason why I think commands should be immutable in core: if core adds a
new argument, it will expose it via e.g. getbundle<N>. Extensions would
have to explicitly monkeypatch the new command and they would probably see
they were in conflict with a core-provided argument when doing so. The
downside to this is extensions adding arguments to wire protocol commands
become out of date since an upgraded client could start using a new wire
protocol command which the extension doesn't yet know about. Something like
bundle2's separate parts constituting namespaces for arguments would help
here. But even then, we don't have a good story for what part names for 3rd
party bundle2 parts should be. Protocol design is hard.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160822/fc2be4b3/attachment.html>