[Bug 5533] New: json encoder is too slow

mercurial-bugs at mercurial-scm.org mercurial-bugs at mercurial-scm.org
Tue Apr 11 20:52:47 UTC 2017


https://bz.mercurial-scm.org/show_bug.cgi?id=5533

            Bug ID: 5533
           Summary: json encoder is too slow
           Product: Mercurial
           Version: default branch
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: feature
          Priority: wish
         Component: Mercurial
          Assignee: bugzilla at mercurial-scm.org
          Reporter: arcppzju+hgbug at gmail.com
                CC: mercurial-devel at mercurial-scm.org

I wrote a simple program to compare the performance difference between stdlib
json and the json routine we have in core:

    from mercurial import encoding
    import contextlib
    import json
    import time

    def hgescape(obj):
        s = '{'
        s += ','.join('"%s":"%s"' % (encoding.jsonescape(k),
                                     encoding.jsonescape(v))
                      for k, v in obj.iteritems())
        s += '}'
        return s

    @contextlib.contextmanager
    def measure(name):
        t1 = time.time()
        yield
        t2 = time.time()
        print('%s: %s' % (name, t2 - t1))

    lines = []
    with measure('insert 50k lines'):
        for l in xrange(50000):
            lines.append({'author': 'test',
                          'commit': 'fe4713a645e44df4bbaeb8a04ea428a2d1c82a4b',
                          'date': '1999-99-99'})

    with measure('stdlib json escape'):
        s = json.dumps(lines)

    with measure('hg json escape'):
        s = ','.join([hgescape(l) for l in lines])

I got something like:

    insert 50k lines: 0.0199460983276
    stdlib json escape: 0.0517330169678
    hg json escape: 1.18240094185

So the core hg json escaping is roughly 25x slower.

That means things like "annotate -Tjson" can spend noticeable time just doing
the formatting.

I can think of two paths worth a try:

  1. Write the json encoding logic in C.
  2. Write a general purpose string-like object in C that does 2 things: `+`
     and `x.join` in a zero-copy manner. This will increase the burden of the
     GC though.

I'm not sure if there are existing libraries doing 2 already.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list