[PATCH 3 of 3] listkeypattern: add listkeypattern wireproto method

Pierre-Yves David pierre-yves.david at ens-lyon.org
Mon Aug 22 08:12:24 EDT 2016



On 08/17/2016 10:08 PM, Durham Goode wrote:
> On 8/16/16 5:22 PM, Pierre-Yves David wrote:
>>
>>
>> On 08/14/2016 07:17 PM, Gregory Szorc wrote:
>>> On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash at fb.com
>>> <mailto:stash at fb.com>> wrote:
>>>
>>>     # HG changeset patch
>>>     # User Stanislau Hlebik <stash at fb.com <mailto:stash at fb.com>>
>>>     # Date 1470999441 25200
>>>     #      Fri Aug 12 03:57:21 2016 -0700
>>>     # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
>>>     # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
>>>     listkeypattern: add listkeypattern wireproto method
>>>
>>>     wireproto method to list remote keys by pattern
>>>
>>>     diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>>>     --- a/mercurial/wireproto.py
>>>     +++ b/mercurial/wireproto.py
>>>     @@ -353,6 +353,22 @@
>>>                            % (namespace, len(d)))
>>>              yield pushkeymod.decodekeys(d)
>>>
>>>     +    @batchable
>>>     +    def listkeypattern(self, namespace, patterns):
>>>     +        if not self.capable('pushkey'):
>>>     +            yield {}, None
>>>     +        f = future()
>>>     +        self.ui.debug('preparing listkeys for "%s" with pattern
>>>     "%s"\n' %
>>>     +                      (namespace, patterns))
>>>     +        yield {
>>>     +            'namespace': encoding.fromlocal(namespace),
>>>     +            'patterns': encodelist(patterns)
>>>     +        }, f
>>>     +        d = f.value
>>>     +        self.ui.debug('received listkey for "%s": %i bytes\n'
>>>     +                      % (namespace, len(d)))
>>>     +        yield pushkeymod.decodekeys(d)
>>>     +
>>>          def stream_out(self):
>>>              return self._callstream('stream_out')
>>>
>>>     @@ -676,7 +692,8 @@
>>>          return repo.opener.tryread('clonebundles.manifest')
>>>
>>>      wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap',
>>> 'pushkey',
>>>     -                 'known', 'getbundle', 'unbundlehash', 'batch']
>>>     +                 'known', 'getbundle', 'unbundlehash', 'batch',
>>>     +                 'listkeypattern']
>>>
>>>      def _capabilities(repo, proto):
>>>          """return a list of capabilities for a repo
>>>     @@ -791,6 +808,12 @@
>>>          d = repo.listkeys(encoding.tolocal(namespace)).items()
>>>          return pushkeymod.encodekeys(d)
>>>
>>>     + at wireprotocommand('listkeypattern', 'namespace patterns *')
>>>
>>>
>>> Why the "*" here? "others" is not used in the function implementation.
>>>
>>>
>>>     +def listkeypattern(repo, proto, namespace, patterns, others):
>>>     +    patterns = decodelist(patterns)
>>>     +    d = repo.listkeys(encoding.tolocal(namespace),
>>>     patterns=patterns).items()
>>>     +    return pushkeymod.encodekeys(d)
>>>     +
>>>      @wireprotocommand('lookup', 'key')
>>>      def lookup(repo, proto, key):
>>>          try:
>>>
>>>
>>> I think introducing a new wire protocol command is the correct way to
>>> solve this problem (as opposed to introducing a new argument on the
>>> existing command).
>>>
>>> However, if we're introducing a new wire protocol command for obtaining
>>> pushkey values, I think we should improve deficiencies in the response
>>> encoding rather than propagate its problems.
>>>
>>> The "listkeys" response encoding can't transmit the full range of binary
>>> values. This can lead to larger (and slower) responses sizes. For
>>> example, a number of pushkey namespaces exchange lists of nodes. These
>>> have to be represented as hex instead of binary. For pushkey namespaces
>>> like phases or obsolescence that can exchange hundreds or thousands of
>>> nodes, the overhead can add up.
>>>
>>> I think the response from a new listkeys command should be using framing
>>> to encode the key names and values so the full range of binary values
>>> can be efficiently transferred. We may also want a special mechanism to
>>> represent a list of nodes, as avoiding the overhead of framing on fixed
>>> width values would be desirable.
>>>
>>> Of course, at the point you introduce a new response encoding, we may
>>> want to call the command "listkeys2." If others agree, I can code up an
>>> implementation and you can add the patterns functionality on top of it.
>>
>> Sorry to be a bit late to the discussion.
>>
>> I don't think we should introduce a new wire-protocol command for this.
>>
>> Individual listkey call have been a large source of race condition and
>> related issue. Bundle2 is able to carry listkey.pushkey call just fine
>> and I think we should prioritize its usage. As bundle2 already have
>> framing, we could just use your better encoding with bundle2 directly.
>> However, we should probably push things further and use dedicated part
>> for commonly used request. Having dedicated type will help more
>> semantic reply and request. The series we are commenting on is a good
>> example of that need for "pattern" here pretty much only apply to
>> bookmark, having a dedicated "channel" (within bundle2) for this would
>> make is painless to add any new arguments we could need.
> If we create a bundle2 part for transmitting bookmark information, what
> does the request flow look like?  getbundle seems to take kwargs, so
> would all requests piggy back their query parameters into the getbundle
> kwargs?  Like passing "cg=False, bookmarks=True, bookmark_pattern=foo/*"
> as arguments that the receiver would know to produce a bookmarkpart that
> matches that parameter (and know not to send a changegroup)?  Seems like
> a recipe for problems if every requestable part type overloads the same
> kwargs.

This is how it work right now (and how extension extend features), it 
requires some care to not step over each other but have been 
satisfactory so far.

> Is the alternative to use the unbundle protocol to send empty parts with
> parameters as the way of requesting data?  Then the response is sent as
> a reply part?  Would that then eventually deprecate getbundle entirely?

There is rarely a 1:1 mapping between argument and part. Some argument 
are need by multiple parts and some data are sent in multiple part. So 
I'm not sure that would help much.

> I'm just trying to understand how we might implement the listkeypattern
> functionality using bundle2 without having to refactor bundle2 much.

simpler proposal I can see is:

- add a capability to say we have a dedicated bookmark part (and 
associated part)
- add and use bookmarkpattern argument to request bookmark (instead of 
requesting for a bookmark listkey part) use '*' in the generic /not 
patter case

In practice we might a bit richer way to specify bookmark. For example I 
can imagine requesting "all bookmark on the pulled set of changeset" 
being something useful.

Cheers,

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list