clone performance of experimental new http client library.

Augie Fackler lists at durin42.com
Wed Oct 19 11:58:32 CDT 2011


On Wed, Oct 19, 2011 at 11:48 AM, Matt Mackall <mpm at selenic.com> wrote:
> On Wed, 2011-10-19 at 18:21 +0200, Antoine Pitrou wrote:
>> On Wed, 19 Oct 2011 10:39:22 -0500
>> Augie Fackler <durin42 at gmail.com> wrote:
>> > >
>> > > It seems as if the cloning goes slower and slower. For example, it starts off pulling the manifests at a good clip and then gets slower and slower till the progress arrow is moving with agonizing slowness by the end. Same for file changes. The progress prediction is always way off.
>> >
>> > Can you get me a public repo to test against? I'm happy to spend some time in a profiler and speed things up, but I need a way to test.
>>
>> Without knowing too much about this, this sounds like a classic case of
>> quadratic behaviour with repeated string concatenation.
>> And indeed in httpclient/__init__.py there's the following code:
>>
>>         if self._chunked:
>>             self._chunked_parsedata(data)
>>             return
>>         elif self._body is not None:
>>             self._body += data
>>             return
>>
>> where self._body apparently never gets reinitialized until the whole
>> response is received.
>>
>> Do note that string concatenation is fast on OSes where realloc() is
>> smart enough not to copy data (like Linux, I assume).
>
> <kernel VM expert hat on>
> On basically all systems, realloc() only works efficiently when there
> aren't existing neighboring allocations in the virtual address space.
> And that's entirely workload-dependent. If your app is regularly
> allocating new chunks of memory (because it's written in a dynamic
> language like Python, for instance), they're quite likely to interfere
> with attempts to realloc older blocks and you'll end up with repeated
> O(N^2) copying.
>
> ..which is one of the reasons I dislike using cStringIO in Mercurial:
> it's based on realloc.
>
>> Intuitively, you should probably use a StringIO or the fast ''.join()
>> idiom instead. I just took a look at httplib and it avoids repeated
>> concatenation (for instance, HTTPResponse._read_chunked() uses
>> ''.join()).
>
> Agreed.

Yeah, I have been looking at making this more efficient (already
reached that conclusion on my own, but still kind of want a
profile-able repo so I can be sure it's enough better) as I have time
today, and will dedicate some time to it this weekend if I can't do
better. It shouldn't be hard, just requires a little thinking.

>
> --
> Mathematics is the supreme nostalgia of our time.
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>


More information about the Mercurial-devel mailing list