[PATCH] Increase performance on Windows by up to 2x
Matt Mackall
mpm at selenic.com
Tue Mar 31 17:37:06 CDT 2009
On Tue, 2009-03-31 at 15:15 -0700, Bryan O'Sullivan wrote:
> On Mon, Mar 30, 2009 at 4:35 PM, Matt Mackall <mpm at selenic.com> wrote:
>
> > Python uses regular stdio buffered I/O for file objects. I
> assume that
> > a win32 read amounts to a system call, so it's naturally
> more
> > expensive to do a few of those interleaved with a seek or
> two than one
> > bigger one that obviates the later need for a seek.
>
> Yeah, we can fix that. The only reason not to read all of a .i
> file is
> if it's really big. So we should read, say, the first 1M
> unconditionally.
>
> But then it would be necessary to implement some kind of buffer
> management that would essentially duplicate the buffering provided by
> file objects, only presumably more slowly because it would be in pure
> Python? Perhaps I'm missing something.
Here are the paths that exist:
read first 1M of .i
check version
is revlog0?
hook up revlog0 parser, hand off data
is revlog1?
is interleaved?
hook up revlog1 parser, hand off data
save buffer in our chunk cache too
is non-interleaved?
read return a full 1M?
hook up lazy index parser
prepopulate first 1M of entries
else
hook up revlog1 parser, hand off data
In the normal course of things, we want to read the entire .i file in
one go. The only time we don't is for huge indexes[1]. We can do this
with a single open and read, we don't even need to stat for file size.
As we're only doing a single read in the normal path and always doing
large reads, we don't have any buffer management to speak of.
[1] where we actually almost always end up (slowly) parsing the whole
damn thing anyway to find a hash, so we might just consider always
reading the entire index always. lazyparser mostly exists to optimize
the case where we're looking at the tip of the changelog.
--
http://selenic.com : development and support for Mercurial and Linux
More information about the Mercurial-devel
mailing list