[PATCH 1 of 6] transaction: ensure journal is committed to file system

Tue Apr 21 16:29:38 CDT 2009

On Tue, 2009-04-21 at 21:14 +0200, Dirkjan Ochtman wrote:
> On Tue, Apr 21, 2009 at 19:33, Henrik Stuart <hg at hstuart.dk> wrote:
> > +    def _write(self, data):
> >         # add enough data to the journal to do the truncate
> > -        self.file.write("%s\0%d\n" % (file, offset))
> > +        self.file.write(data)
> >         self.file.flush()
> > +        # ensure journal is synced to file system
> > +        os.fsync(self.file.fileno())
> 
> I believe Matt stated that he didn't want this.

I'm not really thrilled by it. For starters, it's not actually
sufficient. It doesn't guarantee that the directory containing the inode
is synced to disk.

Second, this will get called a -lot- on some operations. We can
currently call journal write hundreds of times per second. If we throw
a sync in here, we're now writing to at least two (if not three or
four) different blocks on disk with each call and incurring seek
penalties.

If we have a flash device, we could burn through tens of thousands of
block writes in a single clone. Ouch.

If we've got a filesystem like ext3 which (until very recently) defaults
to ordered writes, each sync is basically a pipeline stall for every
user of the filesystem.

So look at the journal as an optimization for cleanup and less as an
integrity tool. We can always use verify to find the last
self-consistent revision.

If you're so inclined, we can add a companion to the journal that stores
just that revision number and gets synced at the beginning of a
transaction. Then we can add a slow path in recover that strips to that
revision.

-- 
http://selenic.com : development and support for Mercurial and Linux