Could we use an unrepr module?

Martin Geisler mg at aragost.com
Fri May 6 02:17:33 CDT 2011


Matt Mackall <mpm at selenic.com> writes:

> On Thu, 2011-05-05 at 19:13 +0200, Benoit Boissinot wrote:
>> On Thu, May 5, 2011 at 6:59 PM, Martin Geisler <mg at lazybytes.net> wrote:
>> > Brodie Rao <brodie at bitheap.org> writes:
>> >
>> >> On Thu, May 5, 2011 at 8:17 AM, Martin Geisler <mg at aragost.com> wrote:
>> >>> Hi guys,
>> >>>
>> >>> I needed a way to serialize data for the lock extension, so I
>> >>> wrote a small module that reverses the normal repr function in
>> >>> Python. It is like eval, but does not execute anything.
>> >>>
>> >>> I think we could use such a module here and there in Mercurial.
>> >>> As an example, I happened to look at the code that writes the
>> >>> merge state:
>> >>
>> >> Maybe I'm missing something here, but why not just use pickle?
>> >
>> > Both pickle and the simpler marshal are unsafe since they can end
>> > up executing data. I'm not really sure why they would do that, but
>> > the documentation warns about it for both modules.
>> 
>> And why not something like JSON?
>
> JSON doesn't know about bytes, only characters.

Yeah, this was actually what made me switch from JSON to repr in my
extension. I started out with ad-hoc serialization using \0 and \n and
so on. That quickly became boring so I switched to JSON via demjson (a
random single-file JSON library I found).

But because I needed to send both Unicode strings (branch names) and
byte strings (file names) over the wire, I ran into problems. I believe
I solved them by giving repr(filename) to the JSON library and decided
that I could just skip the JSON step all together.

The unrepr module is much simpler than a full JSON module and therefore
somewhat faster:

  $ python -m timeit -s 'from json import dumps'
                        'dumps([123, "foo", "bar"])'
  100000 loops, best of 3: 11.9 usec per loop

  $ python -m timeit 'repr([123, "foo", "bar"])'
  1000000 loops, best of 3: 0.873 usec per loop

  $ python -m timeit -s 'from json import loads, dumps'
                     -s 'd = dumps([123, "foo", "bar"])'
                        'loads(d)'
  10000 loops, best of 3: 38.9 usec per loop

  $ python -m timeit -s 'from unrepr import unrepr'
                     -s 'd = repr([123, "foo", "bar"])'
                        'unrepr(d)'
  100000 loops, best of 3: 16 usec per loop

The raw speed is not so important for my purpose, though.

-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/kick-start/


More information about the Mercurial-devel mailing list