[PATCH 08 of 10] repair: migrate revlogs during upgrade

Gregory Szorc gregory.szorc at gmail.com
Sun Nov 6 00:40:24 EDT 2016


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1478393405 25200
#      Sat Nov 05 17:50:05 2016 -0700
# Node ID d2261c558ca9639fb81c182de15d75151cbad0f9
# Parent  958bcf2577608bbb6d8ae078cde0ca451f3ab31a
repair: migrate revlogs during upgrade

Our next step for in-place upgrade is to migrate store data. Revlogs
are the biggest source of data within the store and a store is useless
without them, so we implement their migration first.

Our strategy for migrating revlogs is to walk the store and call
`revlog.copy()` on each revlog. There are some minor complications.

Because revlogs have different storage options (e.g. changelog has
generaldelta and delta chains disabled), we need to obtain the
correct class of revlog so inserted data is encoded properly for its
type.

Because manifests are converted after filelogs and because manifest
conversion can take a long time when large manifests are in play,
a naive progress bar for revlog count was misleading, as it effectively
got to 99% and froze there when processing the manifest. So, there is
a first pass to count revisions and use revisions in the progress bar.
The extra code is somewhat annoying. But this pass serves a secondary
useful purpose: ensuring we can open all revlogs that will be copied.
We don't want to spend several minutes copying revlogs only to
encounter a permissions error or some such later.

As part of this change, we also add swapping of the store directory
to the upgrade function. After revlogs are converted, we move the
old store into the backup directory then move the temporary repo's
store into the old store's location. On well-behaved systems, this
should be 2 atomic operations and the window of inconsistency show be
very narrow.

There are still a number of improvements that need to be made for
store copying...

diff --git a/mercurial/repair.py b/mercurial/repair.py
--- a/mercurial/repair.py
+++ b/mercurial/repair.py
@@ -11,15 +11,19 @@ from __future__ import absolute_import
 import errno
 import hashlib
 import tempfile
+import time
 
 from .i18n import _
 from .node import short
 from . import (
     bundle2,
     changegroup,
+    changelog,
     error,
     exchange,
+    manifest,
     obsolete,
+    revlog,
     scmutil,
     util,
 )
@@ -537,6 +541,87 @@ def upgradesummarizeactions(repo, action
 
     return l, handled
 
+def _revlogfrompath(repo, path):
+    """Obtain a revlog from a repo path.
+
+    An instance of the appropriate class is returned.
+    """
+    if path == '00changelog.i':
+        return changelog.changelog(repo.svfs)
+    elif path.endswith('00manifest.i'):
+        mandir = path[:-len('00manifest.i')]
+        return manifest.manifestrevlog(repo.svfs, dir=mandir)
+    else:
+        # Filelogs don't do anything special with settings. So we can use a
+        # vanilla revlog.
+        return revlog.revlog(repo.svfs, path)
+
+def _copyrevlogs(ui, srcrepo, dstrepo, tr):
+    """Copy revlogs between 2 repos.
+
+    Full decoding/encoding is performed on both ends, ensuring that revlog
+    settings on the destination are honored.
+    """
+    rlcount = 0
+    revcount = 0
+    srcsize = 0
+    dstsize = 0
+
+    # Perform a pass to collect metadata. This validates we can open all
+    # source files and allows a unified progress bar to be displayed.
+    for unencoded, encoded, size in srcrepo.store.walk():
+        srcsize += size
+        if unencoded.endswith('.d'):
+            continue
+        rl = _revlogfrompath(srcrepo, unencoded)
+        rlcount += 1
+        revcount += len(rl)
+
+    ui.write(_('migrating %d revlogs containing %d revisions (%d bytes)\n') %
+             (rlcount, revcount, srcsize))
+
+    if not rlcount:
+        return
+
+    # Used to keep track of progress.
+    convertedcount = [0]
+    def oncopiedrevision(rl, rev, node):
+        convertedcount[0] += 1
+        srcrepo.ui.progress(_('revisions'), convertedcount[0], total=revcount)
+
+    # Do the actual copying.
+    # FUTURE this operation can be farmed off to worker processes.
+    seen = set()
+    for unencoded, encoded, size in srcrepo.store.walk():
+        if unencoded.endswith('.d'):
+            continue
+
+        ui.progress(_('revisions'), convertedcount[0], total=revcount)
+
+        oldrl = _revlogfrompath(srcrepo, unencoded)
+        newrl = _revlogfrompath(dstrepo, unencoded)
+
+        if isinstance(oldrl, manifest.manifestrevlog) and 'm' not in seen:
+            seen.add('m')
+            ui.write(_('migrating manifests...\n'))
+        elif isinstance(oldrl, changelog.changelog) and 'c' not in seen:
+            seen.add('c')
+            ui.write(_('migrating changelog...\n'))
+        elif 'f' not in seen:
+            seen.add('f')
+            ui.write(_('migrating file histories...\n'))
+
+        ui.note(_('copying %d revisions from %s\n') % (len(oldrl), unencoded))
+        oldrl.clone(tr, newrl, addrevisioncb=oncopiedrevision)
+
+        dstsize += newrl.totalfilesize()
+
+    ui.progress(_('revisions'), None)
+
+    ui.write(_('revlogs migration complete; wrote %d bytes (delta %d bytes) '
+               'across %d revlogs and %d revisions\n') % (
+             dstsize, dstsize - srcsize, rlcount, revcount))
+
 def _upgraderepo(ui, srcrepo, dstrepo, requirements):
     """Do the low-level work of upgrading a repository.
 
@@ -550,7 +635,15 @@ def _upgraderepo(ui, srcrepo, dstrepo, r
     assert srcrepo.currentwlock()
     assert dstrepo.currentwlock()
 
-    # TODO copy store
+    ui.write(_('(it is safe to interrupt this process any time before '
+               'data migration completes)\n'))
+
+    with dstrepo.transaction('upgrade') as tr:
+        _copyrevlogs(ui, srcrepo, dstrepo, tr)
+
+        # TODO copy non-revlog store files
+
+    ui.write(_('data fully migrated to temporary repository\n'))
 
     ui.write(_('starting in-place swap of repository data\n'))
     ui.warn(_('(clients may error or see inconsistent repository data until '
@@ -567,6 +660,17 @@ def _upgraderepo(ui, srcrepo, dstrepo, r
     util.copyfile(srcrepo.join('requires'), backupvfs.join('requires'))
     scmutil.writerequires(srcrepo.vfs, requirements)
 
+    # Now swap in the new store directory. Doing it as a rename should make
+    # the operation nearly instantaneous and atomic (at least in well-behaved
+    # environments).
+    ui.write(_('replacing store...\n'))
+    tstart = time.time()
+    util.rename(srcrepo.spath, backupvfs.join('store'))
+    util.rename(dstrepo.spath, srcrepo.spath)
+    elapsed = time.time() - tstart
+    ui.write(_('store replacement complete; repository was inconsistent for '
+               '%0.1fs\n') % elapsed)
+
 def upgraderepo(ui, repo, dryrun=False):
     """Upgrade a repository in place."""
     # Avoid cycle: cmdutil -> repair -> localrepo -> cmdutil
diff --git a/tests/test-upgrade-repo.t b/tests/test-upgrade-repo.t
--- a/tests/test-upgrade-repo.t
+++ b/tests/test-upgrade-repo.t
@@ -9,10 +9,15 @@
   starting repository upgrade
   source repository locked and read-only
   creating temporary repository to stage migrated data: $TESTTMP/empty/.hg/upgrade.* (glob)
+  (it is safe to interrupt this process any time before data migration completes)
+  migrating 0 revlogs containing 0 revisions (0 bytes)
+  data fully migrated to temporary repository
   starting in-place swap of repository data
   (clients may error or see inconsistent repository data until this operation completes)
   replaced files will be backed up at $TESTTMP/empty/.hg/upgradebackup.* (glob)
   updating requirements in $TESTTMP/empty/.hg/requires
+  replacing store...
+  store replacement complete; repository was inconsistent for 0.0s
   removing temporary repository $TESTTMP/empty/.hg/upgrade.* (glob)
 
 dry run works
@@ -51,10 +56,15 @@ Various sub-optimal detections work
   starting repository upgrade
   source repository locked and read-only
   creating temporary repository to stage migrated data: $TESTTMP/empty/.hg/upgrade.* (glob)
+  (it is safe to interrupt this process any time before data migration completes)
+  migrating 0 revlogs containing 0 revisions (0 bytes)
+  data fully migrated to temporary repository
   starting in-place swap of repository data
   (clients may error or see inconsistent repository data until this operation completes)
   replaced files will be backed up at $TESTTMP/empty/.hg/upgradebackup.* (glob)
   updating requirements in $TESTTMP/empty/.hg/requires
+  replacing store...
+  store replacement complete; repository was inconsistent for 0.0s
   removing temporary repository $TESTTMP/empty/.hg/upgrade.* (glob)
 
   $ cd ..
@@ -133,10 +143,19 @@ Upgrading a repository to generaldelta w
   starting repository upgrade
   source repository locked and read-only
   creating temporary repository to stage migrated data: $TESTTMP/upgradegd/.hg/upgrade.* (glob)
+  (it is safe to interrupt this process any time before data migration completes)
+  migrating 5 revlogs containing 9 revisions (917 bytes)
+  migrating file histories...
+  migrating manifests...
+  migrating changelog...
+  revlogs migration complete; wrote 917 bytes (delta 0 bytes) across 5 revlogs and 9 revisions
+  data fully migrated to temporary repository
   starting in-place swap of repository data
   (clients may error or see inconsistent repository data until this operation completes)
   replaced files will be backed up at $TESTTMP/upgradegd/.hg/upgradebackup.* (glob)
   updating requirements in $TESTTMP/upgradegd/.hg/requires
+  replacing store...
+  store replacement complete; repository was inconsistent for 0.0s
   removing temporary repository $TESTTMP/upgradegd/.hg/upgrade.* (glob)
 
 Original requirements backed up
@@ -156,4 +175,43 @@ generaldelta added to original requireme
   revlogv1
   store
 
+store directory has files we expect
+
+  $ ls .hg/store
+  00changelog.i
+  00manifest.i
+  data
+  fncache
+  undo
+  undo.backupfiles
+  undo.phaseroots
+
+manifest should be generaldelta
+
+  $ hg debugrevlog -m | grep flags
+  flags  : inline, generaldelta
+
+verify should be happy
+
+  $ hg verify
+  checking changesets
+  checking manifests
+  crosschecking files in changesets and manifests
+  checking files
+  3 files, 3 changesets, 3 total revisions
+
+old store should be backed up
+
+  $ ls .hg/upgradebackup.*/store
+  00changelog.i
+  00manifest.i
+  data
+  fncache
+  lock
+  phaseroots
+  undo
+  undo.backup.fncache
+  undo.backupfiles
+  undo.phaseroots
+
   $ cd ..


More information about the Mercurial-devel mailing list