[PATCH 4 of 4] mmapindex: set default to 1MB

Gregory Szorc gregory.szorc at gmail.com
Thu Jan 10 19:05:40 EST 2019


On Thu, Jan 10, 2019 at 9:03 AM Boris FELD <boris.feld at octobus.net> wrote:

> On 08/01/2019 15:41, Yuya Nishihara wrote:
> > On Mon, 7 Jan 2019 09:45:32 +0100, Boris FELD wrote:
> >> On 03/01/2019 09:58, Yuya Nishihara wrote:
> >>> On Wed, 2 Jan 2019 23:40:11 +0100, Boris FELD wrote:
> >>>> On 04/12/2018 12:09, Yuya Nishihara wrote:
> >>>>> On Sun, 02 Dec 2018 17:17:43 +0100, Boris Feld wrote:
> >>>>>> # HG changeset patch
> >>>>>> # User Boris Feld <boris.feld at octobus.net>
> >>>>>> # Date 1542949784 -3600
> >>>>>> #      Fri Nov 23 06:09:44 2018 +0100
> >>>>>> # Node ID 9708243274585f9544c70925eb0b0fa0ec7aba4f
> >>>>>> # Parent  0fff68dfbe48d87dce8b8736c0347ed5aa79030e
> >>>>>> # EXP-Topic mmap
> >>>>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >>>>>> #              hg pull
> https://bitbucket.org/octobus/mercurial-devel/ -r 970824327458
> >>>>>> mmapindex: set default to 1MB
> >>>>> Can you check if strip/rollback properly copy the revlog before
> truncating it?
> >>>>>
> >>>>> If a mmapped revlog is truncated by another process, the mapped
> memory could be
> >>>>> invalid. The worst case, the process would be killed by SIGBUS.
> >>>> Hum good catch, process reading a repository being stripped have
> always
> >>>> been up for troubles. However, mmap makes it worse by raising a signal
> >>>> instead of just having wonky seek. It also introduces new cases where
> >>>> this can happen.
> >>> mmap isn't worse because of SIGBUS, but because the index data can be
> updated
> >>> after the index length is determined. Before, a single in-memory
> revlog index
> >>> was at least consistent.
> >>>
> >>>> What shall we do here, I guess our best bet is to intercept these
> SIGBUS
> >>>> when reading revlog index?
> >> Yes, but it would be inconsistent with the data it was pointing to.
> >> Access to this data would result in error too.
> > Correct.
> >
> >> The new thing is that we
> >> can get SIGBUS while accessing the index data themselves, as you are
> >> pointing out.
> > Another new thing is that truncated revisions can be seen as valid
> changesets
> > of '000...' hash with 0-length text. If the whole (or maybe dozens of)
> > revisions were truncated, SIGBUS would be raised. In other cases, the
> truncated
> > part would probably be read as zeros.
> >
> >>> I don't think it'll be easy to handle SIGBUS properly. And SIGBUS
> won't always
> >>> be raised. Perhaps, the easiest solution is to copy the revlog index
> before
> >>> strip/rollback.
> >> I'm afraid at the performance impact, we are talking of potentially
> >> hundreds of MB of index data to be rolled back.
> >>
> >> Maybe we can keep the current truncation in normal transaction rollback
> >> and use the slower copies for the hg strip command itself (and rewrite)?
> >>
> >> However, I'm afraid we need to come up with a solution for mmap as it
> >> would be useful to use it more and more.
> >>
> >> Maybe we can come up with something catching the SIGBUS? Or maybe we
> >> need to never truncate files and keep an alternative way to track the
> >> maximum offset? Any other ideas?
> > I've no idea. My point is that catching SIGBUS wouldn't save us from many
> > possible failures.
> >
> >>> IIRC, the mmap implementation was highly experimental. I don't know if
> it's
> >>> widely tested other than in FB where strip is never used.
> >> We have been using it internally, and one of our clients deployed it
> >> too. It results in significant speed and memory improvement.
> > Yup. I'm just afraid of enabling it by default.
>
> Ok, what do you think we should do for 4.9?
>

We're introducing a new repo requirement in 4.9 for sparse revlogs. I
believe this strip problem (along with a ton of other problems) goes away
if we have reader locks on repos.

Since we are already introducing a new repo requirement for new repos, we
can introduce yet more repo requirements without any additional significant
user inconvenience.

I know it is a stretch since we only have a few days left before the 4.9
RC, but is anybody will to implement reader locks (or some other repo
mutual exclusion mechanism that could mitigate the strip problem)? Of
course, reader locks break the ability to open a repository on a read-only
filesystem. So, we shouldn't use them lightly! But we could do something
like write a repo/transaction "generation number" into .hg/store and read
it to detect repo mutations before certain operations. We could potentially
also have "mmap mode" require writing a file, socket file, etc into
.hg/store and have any writer refuse to strip if there is an active reader
doing mmap.

That's a lot of random ideas, I know. But I think that in order to enable
mmap by default, we need to lock out mutations on mmap'd file segments in
order to prevent crashes. This seemingly requires a reader lock or
transaction changes of some form.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20190110/0011af4b/attachment.html>


More information about the Mercurial-devel mailing list