[PATCH] V11 of experiment for a simpler path encoding for hashed paths (for "fncache2")

Matt Mackall mpm at selenic.com
Fri Sep 28 14:21:13 CDT 2012


On Fri, 2012-09-28 at 10:48 -0700, Bryan O'Sullivan wrote:
> On Fri, Sep 28, 2012 at 4:26 AM, Adrian Buehlmann <adrian at cadifra.com>wrote:
> 
> > V11 of experiment for a simpler path encoding for hashed paths (for
> > "fncache2")
> >
> 
> Right now, your patch and mine make a lot of unrepresentable characters
> safe by mapping them to a single character, '~' 0x7e. This changes the sort
> order of hash-encoded names in undesirable ways: for instance, it means
> that files that originally began with dot or space will, once hash-encoded,
> now sort *after* files that begin with alphanumeric characters.
> 
> With a hot disk cache, the effect will obviously be negligible, but with a
> cold cache, it will lead to extra disk head seeks.
> 
> That possibility can easily be mitigated: instead of mapping all
> unrepresentable characters to '~', just map them to a safe character that
> is lexically close instead. For example, space 0x20 could be mapped to '!'
> 0x21, '.' 0x2e to could be mapped to ',' 0x2c, and likewise for unsafe
> names such as "aux" ("auy").

I'm not terribly concerned. The guideline we're trying to meet is
"pos(x) is close to pos(x+1) for most x". In other words, the opposite
of a hash function.

What's most important here is intra-directory locality: if I read the
repo data for foo/bar/baz/a, will b, c, and d be nearby? With a sane
filesystem, the answer will usually be yes. Which means we'll either get
hardware-level readahead caching (because it's on the same cylinder),
OS-level metadata readahead, and possibly OS-level data readahead as
well, or a short seek.

With a typical 10 or more files per directory, the intra-directory
locality dominates, which we mostly preserve by mostly preserving the
tree structure. Also note that short/directory/really-long-filename will
throw us off our game anyway as we'll move from normal encoding to
hashed.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list