clone performance of experimental new http client library.

Matt Mackall mpm at selenic.com
Wed Oct 19 11:48:46 CDT 2011


On Wed, 2011-10-19 at 18:21 +0200, Antoine Pitrou wrote:
> On Wed, 19 Oct 2011 10:39:22 -0500
> Augie Fackler <durin42 at gmail.com> wrote:
> > > 
> > > It seems as if the cloning goes slower and slower. For example, it starts off pulling the manifests at a good clip and then gets slower and slower till the progress arrow is moving with agonizing slowness by the end. Same for file changes. The progress prediction is always way off.
> > 
> > Can you get me a public repo to test against? I'm happy to spend some time in a profiler and speed things up, but I need a way to test.
> 
> Without knowing too much about this, this sounds like a classic case of
> quadratic behaviour with repeated string concatenation.
> And indeed in httpclient/__init__.py there's the following code:
> 
>         if self._chunked:
>             self._chunked_parsedata(data)
>             return
>         elif self._body is not None:
>             self._body += data
>             return
> 
> where self._body apparently never gets reinitialized until the whole
> response is received.
> 
> Do note that string concatenation is fast on OSes where realloc() is
> smart enough not to copy data (like Linux, I assume).

<kernel VM expert hat on>
On basically all systems, realloc() only works efficiently when there
aren't existing neighboring allocations in the virtual address space.
And that's entirely workload-dependent. If your app is regularly
allocating new chunks of memory (because it's written in a dynamic
language like Python, for instance), they're quite likely to interfere
with attempts to realloc older blocks and you'll end up with repeated
O(N^2) copying.

..which is one of the reasons I dislike using cStringIO in Mercurial:
it's based on realloc.

> Intuitively, you should probably use a StringIO or the fast ''.join()
> idiom instead. I just took a look at httplib and it avoids repeated
> concatenation (for instance, HTTPResponse._read_chunked() uses
> ''.join()).

Agreed.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list