Could we use an unrepr module?
Martin Geisler
mg at aragost.com
Fri May 6 02:17:33 CDT 2011
Matt Mackall <mpm at selenic.com> writes:
> On Thu, 2011-05-05 at 19:13 +0200, Benoit Boissinot wrote:
>> On Thu, May 5, 2011 at 6:59 PM, Martin Geisler <mg at lazybytes.net> wrote:
>> > Brodie Rao <brodie at bitheap.org> writes:
>> >
>> >> On Thu, May 5, 2011 at 8:17 AM, Martin Geisler <mg at aragost.com> wrote:
>> >>> Hi guys,
>> >>>
>> >>> I needed a way to serialize data for the lock extension, so I
>> >>> wrote a small module that reverses the normal repr function in
>> >>> Python. It is like eval, but does not execute anything.
>> >>>
>> >>> I think we could use such a module here and there in Mercurial.
>> >>> As an example, I happened to look at the code that writes the
>> >>> merge state:
>> >>
>> >> Maybe I'm missing something here, but why not just use pickle?
>> >
>> > Both pickle and the simpler marshal are unsafe since they can end
>> > up executing data. I'm not really sure why they would do that, but
>> > the documentation warns about it for both modules.
>>
>> And why not something like JSON?
>
> JSON doesn't know about bytes, only characters.
Yeah, this was actually what made me switch from JSON to repr in my
extension. I started out with ad-hoc serialization using \0 and \n and
so on. That quickly became boring so I switched to JSON via demjson (a
random single-file JSON library I found).
But because I needed to send both Unicode strings (branch names) and
byte strings (file names) over the wire, I ran into problems. I believe
I solved them by giving repr(filename) to the JSON library and decided
that I could just skip the JSON step all together.
The unrepr module is much simpler than a full JSON module and therefore
somewhat faster:
$ python -m timeit -s 'from json import dumps'
'dumps([123, "foo", "bar"])'
100000 loops, best of 3: 11.9 usec per loop
$ python -m timeit 'repr([123, "foo", "bar"])'
1000000 loops, best of 3: 0.873 usec per loop
$ python -m timeit -s 'from json import loads, dumps'
-s 'd = dumps([123, "foo", "bar"])'
'loads(d)'
10000 loops, best of 3: 38.9 usec per loop
$ python -m timeit -s 'from unrepr import unrepr'
-s 'd = repr([123, "foo", "bar"])'
'unrepr(d)'
100000 loops, best of 3: 16 usec per loop
The raw speed is not so important for my purpose, though.
--
Martin Geisler
aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/kick-start/
More information about the Mercurial-devel
mailing list