[PATCH 0 of 1 ] Wrong distance calculation in revlog causes huge manifests

Thu Jul 12 09:13:29 CDT 2012

On Thu, 2012-07-12 at 07:53 +0200, Noel Grandin wrote:
> On 2012-07-12 01:19, Matt Mackall wrote:
> > Note that we intentionally do _a single read_ to pull in the entire 
> > span of deltas, including data we're skipping. This is important for 
> > performance on spinning media: reads are blocking and you have to wait 
> > for every read to complete before issuing another one.
> 
> If that's the case, perhaps we should be using async IO, so that we can 
> issue multiple reads at once, and then wait for them all to complete.

Could be done, if we weren't using Python and there was a nice portable
AIO interface.

> Given that all of the major OS's implement some variation of the 
> elevator algorithm for disk reads, that would give us the performance we 
> want without the memory overhead of reading extra data.

FYI, Linux actually doesn't use an elevator-style algorithm and hasn't
for quite some time. It instead uses a time-slicing algorithm with
fairness guarantees:

https://en.wikipedia.org/wiki/CFQ

For our purposes, this is... pretty great. CFQ will actually wait around
a bit for us to issue the next read request before getting bored and
servicing another process. But we still have to wait for each read to
complete before issuing the next, which means a pipeline stall.

-- 
Mathematics is the supreme nostalgia of our time.