[PATCH] wireproto: use base64 instead of hex for known/getbundle

Martin Geisler mg at lazybytes.net
Tue May 3 03:05:46 CDT 2011


Kevin Bullock <kbullock+mercurial at ringworld.org> writes:

> On May 2, 2011, at 10:31 AM, Steven Brown wrote:
>
>> The node ids each end with '%3D' when sent over HTTP due to the base64
>> padding. Since the node ids have a fixed size, the padding isn't
>> needed.
>
> Will base64 correctly decode a string that doesn't have the padding? I
> think most decoders do. Thus you don't need to explicitly add '=' to
> the end in your proposed from64() below.

No, the module wont allow you to strip the padding:

  >>> base64.b64decode("eA==")
  'x'
  >>> base64.b64decode("eA")
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib/python2.6/base64.py", line 76, in b64decode
      raise TypeError(msg)
  TypeError: Incorrect padding

One can of course add the necessary padding before decoding.

The goal of this patch is to use 6 bits/byte instead of 4 bits/byte, and
so cut the length of the query by a third. This is assuming that the
query consist almost only of node IDs -- but there is also other parts
of the query, so the savings wont be a big.

The patch has an example where we go from

  GET /?cmd=getbundle HTTP/1.1
  x-hgarg-1:common=c1818a9f5977dd4139a48f93f5425c67d44a9368
                  +ea919464b16e003894c48b6cb68df3cd9411b544
            &heads=6b57ee934bb2996050540f84cdfc8dcad1e7267d
                  +2114148793524fd045998f71a45b0aaf139f752b

to

  GET /?cmd=getbundle HTTP/1.1
  x-hgarg-1:common=wYGKn1l33UE5pI-T9UJcZ9RKk2g%3D
                  +6pGUZLFuADiUxItsto3zzZQRtUQ%3D
            &heads=a1fuk0uymWBQVA-EzfyNytHnJn0%3D
                  +IRQUh5NST9BFmY9xpFsKrxOfdSs%3D

I've wrapped the lines so that we can easily see the savings: 13 bytes
per node for 52 bytes in total. This is out of a 218 byte query: a 24%
saving. If the %3D bytes go we get a 64/218 = 29% saving.


The commit message says "Saves a couple of bytes per node id being sent
across the wire." but doesn't explain why this is necessary and why the
cost of doing so pays off -- I'm not convinced that this is worth the
extra complexity and obscurity.

Firstly, being able to see the node IDs directly in the log file has
some value. Having to base64 decode a log makes it much less useful.

Secondly, I don't believe the saved bandwidth makes a difference in
practice: the queries are put into packets of a bigger size anyway and
so a ~64 byte difference in length should disappear.

Finally, we now have a proper solution that lets us pack much more data
into a request: the x-hgarg-N headers used above. Though that also
"just" increases the lenght by a constant amount, that constant is
significantly larger.


-- 
Martin Geisler

Mercurial links: http://mercurial.ch/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20110503/0c2c37cd/attachment.pgp>


More information about the Mercurial-devel mailing list