[PATCH 4 of 4] mmapindex: set default to 1MB

Yuya Nishihara yuya at tcha.org
Tue Jan 8 09:41:56 EST 2019


On Mon, 7 Jan 2019 09:45:32 +0100, Boris FELD wrote:
> On 03/01/2019 09:58, Yuya Nishihara wrote:
> > On Wed, 2 Jan 2019 23:40:11 +0100, Boris FELD wrote:
> >> On 04/12/2018 12:09, Yuya Nishihara wrote:
> >>> On Sun, 02 Dec 2018 17:17:43 +0100, Boris Feld wrote:
> >>>> # HG changeset patch
> >>>> # User Boris Feld <boris.feld at octobus.net>
> >>>> # Date 1542949784 -3600
> >>>> #      Fri Nov 23 06:09:44 2018 +0100
> >>>> # Node ID 9708243274585f9544c70925eb0b0fa0ec7aba4f
> >>>> # Parent  0fff68dfbe48d87dce8b8736c0347ed5aa79030e
> >>>> # EXP-Topic mmap
> >>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >>>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 970824327458
> >>>> mmapindex: set default to 1MB
> >>> Can you check if strip/rollback properly copy the revlog before truncating it?
> >>>
> >>> If a mmapped revlog is truncated by another process, the mapped memory could be
> >>> invalid. The worst case, the process would be killed by SIGBUS.
> >> Hum good catch, process reading a repository being stripped have always
> >> been up for troubles. However, mmap makes it worse by raising a signal
> >> instead of just having wonky seek. It also introduces new cases where
> >> this can happen.
> > mmap isn't worse because of SIGBUS, but because the index data can be updated
> > after the index length is determined. Before, a single in-memory revlog index
> > was at least consistent.
> >
> >> What shall we do here, I guess our best bet is to intercept these SIGBUS
> >> when reading revlog index?
> Yes, but it would be inconsistent with the data it was pointing to.
> Access to this data would result in error too.

Correct.

> The new thing is that we
> can get SIGBUS while accessing the index data themselves, as you are
> pointing out.

Another new thing is that truncated revisions can be seen as valid changesets
of '000...' hash with 0-length text. If the whole (or maybe dozens of)
revisions were truncated, SIGBUS would be raised. In other cases, the truncated
part would probably be read as zeros.

> > I don't think it'll be easy to handle SIGBUS properly. And SIGBUS won't always
> > be raised. Perhaps, the easiest solution is to copy the revlog index before
> > strip/rollback.
> 
> I'm afraid at the performance impact, we are talking of potentially
> hundreds of MB of index data to be rolled back.
> 
> Maybe we can keep the current truncation in normal transaction rollback
> and use the slower copies for the hg strip command itself (and rewrite)?
> 
> However, I'm afraid we need to come up with a solution for mmap as it
> would be useful to use it more and more.
> 
> Maybe we can come up with something catching the SIGBUS? Or maybe we
> need to never truncate files and keep an alternative way to track the
> maximum offset? Any other ideas?

I've no idea. My point is that catching SIGBUS wouldn't save us from many
possible failures.

> > IIRC, the mmap implementation was highly experimental. I don't know if it's
> > widely tested other than in FB where strip is never used.
> We have been using it internally, and one of our clients deployed it
> too. It results in significant speed and memory improvement.

Yup. I'm just afraid of enabling it by default.


More information about the Mercurial-devel mailing list