[PATCH RFC] repair: add mechanism to convert/upgrade a repo in place

Pierre-Yves David pierre-yves.david at ens-lyon.org
Tue Feb 23 09:26:24 EST 2016



On 02/16/2016 12:47 AM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1455580051 28800
> #      Mon Feb 15 15:47:31 2016 -0800
> # Node ID d65feeab5bc118778d9523e7df44fe38804046ce
> # Parent  a3fcea8d55f7c2b3e9d83c00cbe303890a906775
> repair: add mechanism to convert/upgrade a repo in place
>
> Pierre-Yves wants a debug command to "upgrade" repositories to
> generaldelta. Taking a step backward, upgrading a repository to
> generaldelta is a subset of the general task of adding a new
> requirement to an existing repository.

Bitbucket would be delighted to have it.

> This patch begins the implementation of a generic, in-place
> repository "upgrade" mechanism that can be performed on a live
> repo with minimal downtime (as opposed to `hg clone` which
> doesn't take out a lock on the source repo and therefore is
> susceptible from repositories gaining new data while operating
> on them).

I like the idea. This is a smart trick.

> It basically creates a new, empty repo and then iterates over all
> store files from the old repository and copies them, applying any
> requirements differences along the way. Not only will non-gd repos
> get converted to gd, but fncache and dotencode will be added as well,
> assuming they haven't been disabled.
>
> This patch is RFC quality. We need tests. We also need to consider
> how extensions need to hook into this.

I like the idea! (even if I've multiple feedback on current 
implementation). What's your plan for extensions hooking?

> diff --git a/mercurial/commands.py b/mercurial/commands.py
> --- a/mercurial/commands.py
> +++ b/mercurial/commands.py
> @@ -3516,16 +3516,26 @@ def debugsuccessorssets(ui, repo, *revs)
>               if succsset:
>                   ui.write('    ')
>                   ui.write(node2str(succsset[0]))
>                   for node in succsset[1:]:
>                       ui.write(' ')
>                       ui.write(node2str(node))
>               ui.write('\n')
>
> + at command('debugupgraderepo')
> +def debugupgraderepo(ui, repo):
> +    """upgrade a repository to use different features
> +
> +    During the upgrade, errors may be encountered when reading from the
> +    repository. This command should therefore not be executed on live
> +    repositories.
> +    """
> +    repair.upgraderepo(repo)
> +
>   @command('debugwalk', walkopts, _('[OPTION]... [FILE]...'), inferrepo=True)
>   def debugwalk(ui, repo, *pats, **opts):
>       """show how files match on given patterns"""
>       m = scmutil.match(repo[None], pats, opts)
>       items = list(repo.walk(m))
>       if not items:
>           return
>       f = lambda fn: fn
> diff --git a/mercurial/repair.py b/mercurial/repair.py
> --- a/mercurial/repair.py
> +++ b/mercurial/repair.py
> @@ -4,24 +4,28 @@
>   # Copyright 2007 Matt Mackall
>   #
>   # This software may be used and distributed according to the terms of the
>   # GNU General Public License version 2 or any later version.
>
>   from __future__ import absolute_import
>
>   import errno
> +import stat
>
>   from .i18n import _
>   from .node import short
>   from . import (
>       bundle2,
>       changegroup,
>       error,
>       exchange,
> +    localrepo,
> +    revlog,
> +    scmutil,
>       util,
>   )
>
>   def _bundle(repo, bases, heads, node, suffix, compress=True):
>       """create a bundle with the specified revisions as a backup"""
>       cgversion = changegroup.safeversion(repo)
>
>       cg = changegroup.changegroupsubset(repo, bases, heads, 'strip',
> @@ -307,8 +311,129 @@ def stripbmrevset(repo, mark):
>
>       Needs to live here so extensions can use it and wrap it even when strip is
>       not enabled or not present on a box.
>       """
>       return repo.revs("ancestors(bookmark(%s)) - "
>                        "ancestors(head() and not bookmark(%s)) - "
>                        "ancestors(bookmark() and not bookmark(%s))",
>                        mark, mark, mark)
> +
> +# Repository requirements that upgraderepo() can support.
> +supportedupgraderequirements = set([
> +    'fncache',
> +    'dotencode',
> +    'generaldelta',
> +    'revlogv1',
> +    'store',
> +])
> +
> +# Files that should not be copied to the new store as part of an upgrade.
> +ignorestorefiles = set([
> +    'lock',
> +    'fncache',
> +])
> +
> +def upgraderepo(repo):
> +    """Convert a repository to use different features/requirements.
> +
> +    This function performs an in-place "upgrade" of a repository to use
> +    a different set of repository/store features/requirements. It is
> +    intended to convert repositories to use modern features.
> +    """
> +    repo = repo.unfiltered()
> +
> +    if 'store' not in repo.requirements:
> +        raise util.Abort(_('cannot convert repositories missing the "store" '
> +                           'requirement'),
> +                         hint=_('use "hg clone --pull"'))

Can't we ? We could build a store directory and drop it in place?

> +    # FUTURE provide ability to adjust requirements via function arguments.
> +    createreqs = localrepo.newreporequirements(repo)
> +    missingreqs = createreqs - repo.requirements
> +    removedreqs = repo.requirements - createreqs

Why not just use the config "format" section for that. This way you can 
upgrade your repo fleet following a global config in a snapshot. Also 
the way you would control format for cloning and upgrade would be the same.


> +
> +    if removedreqs:
> +        raise util.Abort(_('cannot convert repository; removing requirement '
> +                           'not supported: %s' %
> +                           ', '.join(sorted(removedreqs))))

You should move this long message in temporary variable. The multi line 
thing is getting quite hard to follow.

> +
> +    unsupportedreqs = missingreqs - supportedupgraderequirements
> +    if unsupportedreqs:
> +        raise util.Abort(_('cannot convert repository; new requirement not '
> +                           'supported: %s' %
> +                           ', '.join(sorted(unsupportedreqs))))
> +
> +    repo.ui.write(_('adding requirements: %s\n' %
> +                    ', '.join(sorted(missingreqs))))
> +
> +    with repo.wlock():
> +        with repo.lock():
> +            _upgradestore(repo, createreqs)
> +
> +            # TODO invalidate repo.svfs and other cached objects.

As yuya smartly spotted, we have an here for other reader reading 
requirements before waiting on lock. Should we temporaritly overwrite 
the requirement file with some XXXBEINGUPGRADEDXXX requirement? We would 
still have a race but it would be a small one.


> +
> +def _upgradestore(repo, requirements):
> +    try:
> +        # It is easier to create a new repo than to instantiate all the
> +        # components separately.
> +        tmprepo = localrepo.localrepository(repo.baseui,
> +                                            path=repo.join('tmprepo'),
> +                                            create=True)

I've a bad feeling about this. But I've not better proposal.

> +        with tmprepo.transaction('upgrade') as tr:
> +            # Start by cloning revlogs individually.
> +            total = 0
> +            for t in repo.store.walk():
> +                if t[0].endswith('.i'):
> +                    total += 1
> +
> +            i = 0
> +            for unencoded, encoded, size in repo.store.walk():
> +                if unencoded.endswith('.d'):
> +                    continue
> +
> +                i += 1
> +                repo.ui.progress('upgrade', i, total=total)
> +
> +                oldrl = revlog.revlog(repo.svfs, unencoded)
> +                newrl = revlog.revlog(tmprepo.svfs, unencoded)
> +
> +                # generaldelta is never enabled on changelog because it isn't
> +                # useful.
> +                if unencoded == '00changelog.i':
> +                    newrl.version &= ~revlog.REVLOGGENERALDELTA
> +                    newrl._generaldelta = False
> +
> +                oldrl.clone(newrl, tr)
> +
> +            repo.ui.progress('upgrade', None)

Could we get a unified progress bar? or a unified one?

> +
> +            # Now copy other files in the store directory.
> +            for p, kind, st in repo.store.vfs.readdir('', stat=True):
> +                # Skip revlogs.
> +                if p.endswith(('.i', '.d')):
> +                    continue
> +                # Skip transaction related files.
> +                if p.startswith('undo'):
> +                    continue
> +                # Skip other skipped files.
> +                if p in ignorestorefiles:
> +                    continue
> +                # Only copy regular files.
> +                if kind != stat.S_IFREG:
> +                    continue
> +
> +                repo.ui.write(_('copying %s\n' % p))
> +                src = repo.store.vfs.join(p)
> +                dst = tmprepo.store.vfs.join(p)
> +                util.copyfile(src, dst, copystat=True)
> +
> +        scmutil.writerequires(repo.vfs, requirements)
> +
> +        # Now rename and swap the 2 store directories. Doing it as a rename
> +        # should make the operation nearly instantaneous.
> +        bakpath = repo.vfs.join('store.bak')
> +        util.rename(repo.spath, bakpath)
> +        util.rename(tmprepo.spath, repo.spath)

This won't be instant on bad file system like FAT.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list