clone performance of experimental new http client library.
mpm at selenic.com
Wed Oct 19 12:24:39 CDT 2011
On Wed, 2011-10-19 at 18:52 +0200, Antoine Pitrou wrote:
> On Wed, 19 Oct 2011 11:48:46 -0500
> Matt Mackall <mpm at selenic.com> wrote:
> > <kernel VM expert hat on>
> > On basically all systems, realloc() only works efficiently when there
> > aren't existing neighboring allocations in the virtual address space.
> > And that's entirely workload-dependent. If your app is regularly
> > allocating new chunks of memory (because it's written in a dynamic
> > language like Python, for instance), they're quite likely to interfere
> > with attempts to realloc older blocks and you'll end up with repeated
> > O(N^2) copying.
> <naïve kernel-related question>
> If the memory block (assuming it's a large one) was allocated using
> mmap(), can't mremap() grow it efficiently?
Usually yes, but with caveats. One such caveat is that rewriting and
flushing page tables is actually quite expensive even though userspace
thinks of it as free. You can easily get your O(N^2) behavior back, just
with a smaller constant factor. There is a size range where this works
well, but it definitely has both lower and upper bounds.
> > ..which is one of the reasons I dislike using cStringIO in Mercurial:
> > it's based on realloc.
> It's based on realloc, but it uses overallocation so that the amortized
> cost is still O(n).
Hmm, I see (that's O(nlog n), but close enough).
Modules/cStringIO.c:391. With an overallocation factor of 2 and no
fallback, it's likely to break prematurely when plenty of VM space is
still available. Which is sad for us.
> Conversely, concatenating immutable strings will
> allocate exactly the necessary memory size, which makes repeated
> concatenation O(n**2).
Yep. The join approach is the best in my opinion. Though I frankly wish
Python had implemented a native large string strategy to parallel its
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel