[PATCH 1 of 2] journal: add dirstate tracking

Sun Jul 3 03:42:26 UTC 2016

On Thu, 30 Jun 2016 13:41:25 +0100, Martijn Pieters wrote:
> # HG changeset patch
> # User Martijn Pieters <mjpieters at fb.com>
> # Date 1467290363 -3600
> #      Thu Jun 30 13:39:23 2016 +0100
> # Node ID 09c7b17ab01c202460543f02d9d71ac643f85a20
> # Parent  cf092a3d202a11a670f2bbfd514bc07d6cc4cee7
> journal: add dirstate tracking

> +# hooks to record dirstate changes
> +def wrapdirstate(orig, repo):
> +    dirstate = orig(repo)
> +    if util.safehasattr(repo, 'journal'):
> +        dirstate.journalrepo = repo
> +    return dirstate

IIRC, it's considered layering violation to pass repo to dirstate. Perhaps,
it could be dirstate.journal = repo.journal, or carry journal by a transaction
object.

> +def recorddirstateparents(orig, dirstate, dirstatefp):
> +    """Records all dirstate parent changes in the journal."""
> +    if util.safehasattr(dirstate, 'journalrepo'):
> +        old = [node.nullid, node.nullid]
> +        nodesize = len(node.nullid)
> +        try:
> +            # The only source for the old state is in the dirstate file
> +            # still on disk; the in-memory dirstate object only contains
> +            # the new state.
> +            with dirstate._opener(dirstate._filename) as fp:
> +                state = fp.read(2 * nodesize)
> +            if len(state) == 2 * nodesize:
> +                old = [state[:nodesize], state[nodesize:]]
> +        except IOError:
> +            pass
> +
> +        new = dirstate.parents()
> +        if old != new:
> +            # only record two hashes if there was a merge
> +            oldhashes = old[:1] if old[1] == node.nullid else old
> +            newhashes = new[:1] if new[1] == node.nullid else new
> +            dirstate.journalrepo.journal.record(
> +                wdirparenttype, '.', oldhashes, newhashes)
> +
> +    return orig(dirstate, dirstatefp)

dirstatefp could be a backup file. Maybe that's okay because the old state
should be identical to the current state for backups, but I'm not pretty sure.
Can you take a look and add comments?

> +    def _write(self, entry, _wait=False):
> +        try:
> +            with self.repo.wlock(wait=_wait):
> +                version = None
> +                # open file in amend mode to ensure it is created if missing
> +                with self.vfs('journal', mode='a+b', atomictemp=True) as f:
> +                    f.seek(0, os.SEEK_SET)
> +                    # Read just enough bytes to get a version number (up to 2
> +                    # digits plus separator)
> +                    version = f.read(3).partition('\0')[0]
> +                    if version and version != str(storageversion):
> +                        # different version of the storage. Exit early (and
> +                        # not write anything) if this is not a version we can
> +                        # handle or the file is corrupt. In future, perhaps
> +                        # rotate the file instead?
> +                        self.repo.ui.warn(
> +                            _("unsupported journal file version '%s'\n") %
> +                            version)
> +                        return
> +                    if not version:
> +                        # empty file, write version first
> +                        f.write(str(storageversion) + '\0')
> +                    f.seek(0, os.SEEK_END)
> +                    f.write(str(entry) + '\0')
> +        except error.LockHeld as lockerr:
> +            lock = self.repo._wlockref and self.repo._wlockref()
> +            if lock and lockerr.locker == '%s:%s' % (lock._host, lock.pid):
> +                # the dirstate can be written out during wlock unlock, before
> +                # the lockfile is removed. Re-run the write as a postrelease
> +                # function instead.
> +                lock.postrelease.append(
> +                    lambda: self._write(entry, _wait=True))
> +            else:
> +                # another process holds the lock, retry and wait
> +                self._write(entry, _wait=True)

I understand why this is necessary, but it seems wrong to try to take a lock
in lock.releasefn(), which can't guarantee consistency. We could delay
"lock.held = 0" after releasefn() as the lock wasn't released yet, but it
would result in two releasefn() calls.

Instead, I think it will be way simpler if the journal has a dedicated lock,
and we can get rid of the cycle between journalstrage and localrepo.