[PATCH RFC] repair: add mechanism to convert/upgrade a repo in place

Gregory Szorc gregory.szorc at gmail.com
Sat Mar 12 20:35:47 EST 2016


On Tue, Feb 23, 2016 at 6:26 AM, Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 02/16/2016 12:47 AM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1455580051 28800
>> #      Mon Feb 15 15:47:31 2016 -0800
>> # Node ID d65feeab5bc118778d9523e7df44fe38804046ce
>> # Parent  a3fcea8d55f7c2b3e9d83c00cbe303890a906775
>> repair: add mechanism to convert/upgrade a repo in place
>>
>> Pierre-Yves wants a debug command to "upgrade" repositories to
>> generaldelta. Taking a step backward, upgrading a repository to
>> generaldelta is a subset of the general task of adding a new
>> requirement to an existing repository.
>>
>
> Bitbucket would be delighted to have it.
>
> This patch begins the implementation of a generic, in-place
>> repository "upgrade" mechanism that can be performed on a live
>> repo with minimal downtime (as opposed to `hg clone` which
>> doesn't take out a lock on the source repo and therefore is
>> susceptible from repositories gaining new data while operating
>> on them).
>>
>
> I like the idea. This is a smart trick.
>
> It basically creates a new, empty repo and then iterates over all
>> store files from the old repository and copies them, applying any
>> requirements differences along the way. Not only will non-gd repos
>> get converted to gd, but fncache and dotencode will be added as well,
>> assuming they haven't been disabled.
>>
>> This patch is RFC quality. We need tests. We also need to consider
>> how extensions need to hook into this.
>>
>
> I like the idea! (even if I've multiple feedback on current
> implementation). What's your plan for extensions hooking?


I'm not sure what the plan for extension hooking is. I'll think of
something. I imagine it will be one of those things where we add hook
points once we figure out what is needed. Someone who has hacked around
with largefiles, narrowhg, lz4revlog, or anything else touching store foo
could probably set me on the right track...


>
>
> diff --git a/mercurial/commands.py b/mercurial/commands.py
>> --- a/mercurial/commands.py
>> +++ b/mercurial/commands.py
>> @@ -3516,16 +3516,26 @@ def debugsuccessorssets(ui, repo, *revs)
>>               if succsset:
>>                   ui.write('    ')
>>                   ui.write(node2str(succsset[0]))
>>                   for node in succsset[1:]:
>>                       ui.write(' ')
>>                       ui.write(node2str(node))
>>               ui.write('\n')
>>
>> + at command('debugupgraderepo')
>> +def debugupgraderepo(ui, repo):
>> +    """upgrade a repository to use different features
>> +
>> +    During the upgrade, errors may be encountered when reading from the
>> +    repository. This command should therefore not be executed on live
>> +    repositories.
>> +    """
>> +    repair.upgraderepo(repo)
>> +
>>   @command('debugwalk', walkopts, _('[OPTION]... [FILE]...'),
>> inferrepo=True)
>>   def debugwalk(ui, repo, *pats, **opts):
>>       """show how files match on given patterns"""
>>       m = scmutil.match(repo[None], pats, opts)
>>       items = list(repo.walk(m))
>>       if not items:
>>           return
>>       f = lambda fn: fn
>> diff --git a/mercurial/repair.py b/mercurial/repair.py
>> --- a/mercurial/repair.py
>> +++ b/mercurial/repair.py
>> @@ -4,24 +4,28 @@
>>   # Copyright 2007 Matt Mackall
>>   #
>>   # This software may be used and distributed according to the terms of
>> the
>>   # GNU General Public License version 2 or any later version.
>>
>>   from __future__ import absolute_import
>>
>>   import errno
>> +import stat
>>
>>   from .i18n import _
>>   from .node import short
>>   from . import (
>>       bundle2,
>>       changegroup,
>>       error,
>>       exchange,
>> +    localrepo,
>> +    revlog,
>> +    scmutil,
>>       util,
>>   )
>>
>>   def _bundle(repo, bases, heads, node, suffix, compress=True):
>>       """create a bundle with the specified revisions as a backup"""
>>       cgversion = changegroup.safeversion(repo)
>>
>>       cg = changegroup.changegroupsubset(repo, bases, heads, 'strip',
>> @@ -307,8 +311,129 @@ def stripbmrevset(repo, mark):
>>
>>       Needs to live here so extensions can use it and wrap it even when
>> strip is
>>       not enabled or not present on a box.
>>       """
>>       return repo.revs("ancestors(bookmark(%s)) - "
>>                        "ancestors(head() and not bookmark(%s)) - "
>>                        "ancestors(bookmark() and not bookmark(%s))",
>>                        mark, mark, mark)
>> +
>> +# Repository requirements that upgraderepo() can support.
>> +supportedupgraderequirements = set([
>> +    'fncache',
>> +    'dotencode',
>> +    'generaldelta',
>> +    'revlogv1',
>> +    'store',
>> +])
>> +
>> +# Files that should not be copied to the new store as part of an upgrade.
>> +ignorestorefiles = set([
>> +    'lock',
>> +    'fncache',
>> +])
>> +
>> +def upgraderepo(repo):
>> +    """Convert a repository to use different features/requirements.
>> +
>> +    This function performs an in-place "upgrade" of a repository to use
>> +    a different set of repository/store features/requirements. It is
>> +    intended to convert repositories to use modern features.
>> +    """
>> +    repo = repo.unfiltered()
>> +
>> +    if 'store' not in repo.requirements:
>> +        raise util.Abort(_('cannot convert repositories missing the
>> "store" '
>> +                           'requirement'),
>> +                         hint=_('use "hg clone --pull"'))
>>
>
> Can't we ? We could build a store directory and drop it in place?
>

We probably can. I just wasn't knowledgeable enough about what the
pre-store layout was like to feel comfortable implementing this.


>
> +    # FUTURE provide ability to adjust requirements via function
>> arguments.
>> +    createreqs = localrepo.newreporequirements(repo)
>> +    missingreqs = createreqs - repo.requirements
>> +    removedreqs = repo.requirements - createreqs
>>
>
> Why not just use the config "format" section for that. This way you can
> upgrade your repo fleet following a global config in a snapshot. Also the
> way you would control format for cloning and upgrade would be the same.
>
>
> +
>> +    if removedreqs:
>> +        raise util.Abort(_('cannot convert repository; removing
>> requirement '
>> +                           'not supported: %s' %
>> +                           ', '.join(sorted(removedreqs))))
>>
>
> You should move this long message in temporary variable. The multi line
> thing is getting quite hard to follow.
>
> +
>> +    unsupportedreqs = missingreqs - supportedupgraderequirements
>> +    if unsupportedreqs:
>> +        raise util.Abort(_('cannot convert repository; new requirement
>> not '
>> +                           'supported: %s' %
>> +                           ', '.join(sorted(unsupportedreqs))))
>> +
>> +    repo.ui.write(_('adding requirements: %s\n' %
>> +                    ', '.join(sorted(missingreqs))))
>> +
>> +    with repo.wlock():
>> +        with repo.lock():
>> +            _upgradestore(repo, createreqs)
>> +
>> +            # TODO invalidate repo.svfs and other cached objects.
>>
>
> As yuya smartly spotted, we have an here for other reader reading
> requirements before waiting on lock. Should we temporaritly overwrite the
> requirement file with some XXXBEINGUPGRADEDXXX requirement? We would still
> have a race but it would be a small one.
>

I like this idea as a very crude way of implementing reader locks.


>
>
> +
>> +def _upgradestore(repo, requirements):
>> +    try:
>> +        # It is easier to create a new repo than to instantiate all the
>> +        # components separately.
>> +        tmprepo = localrepo.localrepository(repo.baseui,
>> +                                            path=repo.join('tmprepo'),
>> +                                            create=True)
>>
>
> I've a bad feeling about this. But I've not better proposal.


Instantiating the store and vfs objects is somewhat complicated. I felt it
easier to do this.


>
>
> +        with tmprepo.transaction('upgrade') as tr:
>> +            # Start by cloning revlogs individually.
>> +            total = 0
>> +            for t in repo.store.walk():
>> +                if t[0].endswith('.i'):
>> +                    total += 1
>> +
>> +            i = 0
>> +            for unencoded, encoded, size in repo.store.walk():
>> +                if unencoded.endswith('.d'):
>> +                    continue
>> +
>> +                i += 1
>> +                repo.ui.progress('upgrade', i, total=total)
>> +
>> +                oldrl = revlog.revlog(repo.svfs, unencoded)
>> +                newrl = revlog.revlog(tmprepo.svfs, unencoded)
>> +
>> +                # generaldelta is never enabled on changelog because it
>> isn't
>> +                # useful.
>> +                if unencoded == '00changelog.i':
>> +                    newrl.version &= ~revlog.REVLOGGENERALDELTA
>> +                    newrl._generaldelta = False
>> +
>> +                oldrl.clone(newrl, tr)
>> +
>> +            repo.ui.progress('upgrade', None)
>>
>
> Could we get a unified progress bar? or a unified one?
>
>
> +
>> +            # Now copy other files in the store directory.
>> +            for p, kind, st in repo.store.vfs.readdir('', stat=True):
>> +                # Skip revlogs.
>> +                if p.endswith(('.i', '.d')):
>> +                    continue
>> +                # Skip transaction related files.
>> +                if p.startswith('undo'):
>> +                    continue
>> +                # Skip other skipped files.
>> +                if p in ignorestorefiles:
>> +                    continue
>> +                # Only copy regular files.
>> +                if kind != stat.S_IFREG:
>> +                    continue
>> +
>> +                repo.ui.write(_('copying %s\n' % p))
>> +                src = repo.store.vfs.join(p)
>> +                dst = tmprepo.store.vfs.join(p)
>> +                util.copyfile(src, dst, copystat=True)
>> +
>> +        scmutil.writerequires(repo.vfs, requirements)
>> +
>> +        # Now rename and swap the 2 store directories. Doing it as a
>> rename
>> +        # should make the operation nearly instantaneous.
>> +        bakpath = repo.vfs.join('store.bak')
>> +        util.rename(repo.spath, bakpath)
>> +        util.rename(tmprepo.spath, repo.spath)
>>
>
> This won't be instant on bad file system like FAT.
>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160312/6aaf7348/attachment.html>


More information about the Mercurial-devel mailing list