[PATCH STABLE] dirstate: ignore symlinks when fs cannot handle them (issue1888)

Mon Aug 9 12:15:29 CDT 2010

On Mon, 2010-08-09 at 16:20 +0200, Martin Geisler wrote:
> Matt Mackall <mpm at selenic.com> writes:
> 
> > On Fri, 2010-07-23 at 15:33 +0200, Martin Geisler wrote:
> >> Matt Mackall <mpm at selenic.com> writes:
> >> 
> >> > On Thu, 2010-07-22 at 14:20 +0200, Martin Geisler wrote:
> >> >> # HG changeset patch
> >> >> # User Martin Geisler <mg at aragost.com>
> >> >> # Date 1279801072 -7200
> >> >> # Branch stable
> >> >> # Node ID 16b70e8b69d3175079fc34857890e483bf37f480
> >> >> # Parent  91af149b5cd72dc91c1e3ae4ee018caf7203323e
> >> >> dirstate: ignore symlinks when fs cannot handle them (issue1888)
> >> >
> >> > Looks good.
> >> 
> >> I'm afraid the patch is buggy -- this line
> >> 
> >>   not (mode & lnkkind) or self._checklink
> >> 
> >> should be
> >> 
> >>   getkind(mode) != lnkkind or self._checklink)
> >
> > That's just this:
> >
> > def S_IFMT(mode):
> >     return mode & 0170000
> >
> > So perhaps (mode & lnkkind != lnkkind) is sufficient.
> 
> Yes, that works! The performance penalty is now pretty benign as shown
> here where I ran 'hg perfstatus' 10 times:
> 
>   Before:          After:
>   min: 0.544703    min: 0.546549
>   med: 0.547592    med: 0.548881
>   avg: 0.549146    avg: 0.548549
>   max: 0.564112    max: 0.551504
> 
> The median time is increased about 0.24%. I pushed the patch as
> changeset ca6cebd8734e.

Thanks, looks good.

For reference, I generally only pay attention to the min numbers in
these tuning benchmarks. Most of the major sources of noise in these
benchmarks are additive-only (scheduling, power management, disk cache
misses, recompiling), while things that can positively influence it (eg
unusually good cache alignment) are pretty rare and minor. So looking at
min gives us the least-noisy view down into the thing we're trying to
tune.

If you think about what the distribution of timing results looks like in
a perfect environment, you've got a hard wall at some value X which is
optimal cache alignment, hot caches, minimal interrupts. Then you've got
another point X2 which is typical cache alignment, typical interrupts,
etc. For a benchmark with a reasonably large memory footprint like
checking the status of 10k files, X and X2 are going to be quite close
together. Right now our distribution has a cut-off at X, a hump around
X2, and a short tail leading off to the right.

Now if we move onto a typical system, with other tasks like GUIs and web
browsers and interrupt sources like mice, etc., our wall remains at X,
but our average value can get pushed to the right by quite a bit, and
the 'hump' will get spread across a wide range centered around a new X3,
and the tail a much wider range - if you get some big email in the
background, your benchmark might get starved for seconds. 

For purposes of tuning, X3 is not much use. You need a pretty large
sample size to actually pin it down with any accuracy because the
deviation is much higher. But min() on a decent sample size will
typically get us pretty close to X/X2 again. And optimizing X2 on a
benchmark that's not heavily sensitive to L1 cache effect (ie basically
everything you'd ever do in Python) will also optimize X3.

-- 
Mathematics is the supreme nostalgia of our time.