[PATCH] Increase performance on Windows by up to 2x

Tue Mar 31 17:37:06 CDT 2009

On Tue, 2009-03-31 at 15:15 -0700, Bryan O'Sullivan wrote:
> On Mon, Mar 30, 2009 at 4:35 PM, Matt Mackall <mpm at selenic.com> wrote:
>  
>         > Python uses regular stdio buffered I/O for file objects. I
>         assume that
>         > a win32 read amounts to a system call, so it's naturally
>         more
>         > expensive to do a few of those interleaved with a seek or
>         two than one
>         > bigger one that obviates the later need for a seek.
>         
>         Yeah, we can fix that. The only reason not to read all of a .i
>         file is
>         if it's really big. So we should read, say, the first 1M
>         unconditionally.
> 
> But then it would be necessary to implement some kind of buffer
> management that would essentially duplicate the buffering provided by
> file objects, only presumably more slowly because it would be in pure
> Python? Perhaps I'm missing something.

Here are the paths that exist:

read first 1M of .i
 check version
  is revlog0?
   hook up revlog0 parser, hand off data
  is revlog1?
   is interleaved?
    hook up revlog1 parser, hand off data
    save buffer in our chunk cache too
   is non-interleaved?
    read return a full 1M?
     hook up lazy index parser
     prepopulate first 1M of entries
    else
     hook up revlog1 parser, hand off data

In the normal course of things, we want to read the entire .i file in
one go. The only time we don't is for huge indexes[1]. We can do this
with a single open and read, we don't even need to stat for file size.
As we're only doing a single read in the normal path and always doing
large reads, we don't have any buffer management to speak of.

[1] where we actually almost always end up (slowly) parsing the whole
damn thing anyway to find a hash, so we might just consider always
reading the entire index always. lazyparser mostly exists to optimize
the case where we're looking at the tip of the changelog.

-- 
http://selenic.com : development and support for Mercurial and Linux