[PATCH] worker: change partition strategy to every Nth element

Matt Mackall mpm at selenic.com
Tue Feb 23 15:28:43 EST 2016


On Mon, 2016-02-22 at 17:46 -0500, Augie Fackler wrote:
> On Sat, Feb 20, 2016 at 04:02:11PM -0800, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1456012604 28800
> > #      Sat Feb 20 15:56:44 2016 -0800
> > # Node ID 81b1e7ce39d837e352e5ff9e799e512600ab4ba6
> > # Parent  8449ef66f732a71008dc887ffd6efbfb6dc64ee0
> > worker: change partition strategy to every Nth element
> 
> queued this, thanks

Both this and the original partitioning are rather unfortunate from a disk
locality standpoint.

Single-threaded Mercurial makes a point of creating and visiting files in a
fixed order (alphabetical). When creating files in order, a typical filesystem
is likely to allocate them on nearby regions on disk. Thus, when revisiting in
the same order, locality is maximized and various forms of OS and disk-level
caching and read-ahead get a chance to work.

This effect can be quite significant on spinning disks. I discovered it circa
Mercurial v0.4 when revlogs were named by hashes of filenames. Tarring a repo
and copying it to another disk effectively randomized the revlog ordering on
disk by sorting the revlogs by hash and suddenly performance of my kernel
checkout benchmark dropped by ~10x because the "working set" of sectors visited
no longer fit in the drive's cache and the workload switched from streaming to
random I/O. 

What we should really be doing is have workers read filenames from a ordered
queue. This preserves locality and also keeps any worker from getting more than
one file out of balance.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list