[PATCH] worker: change partition strategy to every Nth element
Matt Mackall
mpm at selenic.com
Tue Feb 23 15:28:43 EST 2016
On Mon, 2016-02-22 at 17:46 -0500, Augie Fackler wrote:
> On Sat, Feb 20, 2016 at 04:02:11PM -0800, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1456012604 28800
> > # Sat Feb 20 15:56:44 2016 -0800
> > # Node ID 81b1e7ce39d837e352e5ff9e799e512600ab4ba6
> > # Parent 8449ef66f732a71008dc887ffd6efbfb6dc64ee0
> > worker: change partition strategy to every Nth element
>
> queued this, thanks
Both this and the original partitioning are rather unfortunate from a disk
locality standpoint.
Single-threaded Mercurial makes a point of creating and visiting files in a
fixed order (alphabetical). When creating files in order, a typical filesystem
is likely to allocate them on nearby regions on disk. Thus, when revisiting in
the same order, locality is maximized and various forms of OS and disk-level
caching and read-ahead get a chance to work.
This effect can be quite significant on spinning disks. I discovered it circa
Mercurial v0.4 when revlogs were named by hashes of filenames. Tarring a repo
and copying it to another disk effectively randomized the revlog ordering on
disk by sorting the revlogs by hash and suddenly performance of my kernel
checkout benchmark dropped by ~10x because the "working set" of sectors visited
no longer fit in the drive's cache and the workload switched from streaming to
random I/O.
What we should really be doing is have workers read filenames from a ordered
queue. This preserves locality and also keeps any worker from getting more than
one file out of balance.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list