[PATCH 3 of 7 upgraderepo V2] repair: determine what upgrade will do

Augie Fackler raf at durin42.com
Tue Dec 13 11:15:29 EST 2016


On Thu, Nov 24, 2016 at 07:28:31PM -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1480035243 28800
> #      Thu Nov 24 16:54:03 2016 -0800
> # Node ID 328d95ac1fd128359a2665f2959e030ee8cce046
> # Parent  4fffbdc2bca22e666482b55562c0e592c8a68027
> repair: determine what upgrade will do
>
> This commit is rather large. It started off as multiple commits
> but I couldn't figure out a good way to split up the functionality
> while still conveying the overall strategy. To the reviewer, I
> apologize.
>
> This commit introducts a ton of functionality around determining
> how a repository can be upgraded.
>
> The first code you'll see is related to repository requirements and
> updates. There are no less than 5 functions returnings sets of
> requirements that control upgrading. Why so many functions?
> Mainly extensions. These would ideally be module variables. But
> extensions will need to inject themselves into the upgrade process.
> And if you are going to make 1 set extensible, you might as well
> make them all extensible.
>
> Astute readers will see that we don't support "manifestv2" and
> "treemanifest" requirements in the upgrade mechanism. I don't have
> a great answer for why other than this is a complex set of patches
> and I don't want to deal with the complexity of these experimental
> features just yet. We can teach the upgrade mechanism about them
> later, once the basic upgrade mechanism is in place.
>
> The "upgradefindimprovements" function introduces a mechanism to
> return a list of improvements that can be made to a repository.
> Each improvement is effectively an action that an upgrade will
> perform. Associated with each of these improvements is metadata
> that will be used to inform users what's wrong and what an
> upgrade will do.
>
> Each "improvement" is categorized as a "deficiency" or an
> "optimization." TBH, I'm not thrilled about the terminology and
> am receptive to constructive bikeshedding. The main difference
> between a "deficiency" and an "optimization" is a deficiency
> is always corrected (if it deviates from the current config) and
> an "optimization" is an optional action that goes above and beyond
> to improve the state of the repository (usually by requiring more
> CPU during upgrade).
>
> Our initial set of improvements identifies missing repository
> requirements, a single, easily correctable problem with
> changelog storage, and a set of "optimizations" related to delta
> recalculation.
>
> The main "upgraderepo" function glues all of this code together.
> It verifies that the repository is capable of being upgraded by
> looking at the state of current and proposed requirements and
> every difference therein. It then queries for the list of
> improvements and determines which of them will run based on the
> current repository state and user arguments. A non-trivial amount
> of code is required to print all the improvements and what
> actions will be taken by an upgrade. Some of this is factored into
> inline functions to facilitate use in a subsequent commit.
>
> I went through numerous iterations of the output format before
> settling on a ReST-inspired definition list format. (I used
> bulleted lists in the first submission of this commit and could
> not get it to format just right.) Even with the various iterations,
> I'm still not super thrilled with the format. But, this is a debug*
> command, so that should mean we can refine the output without BC
> concerns.
>
> Rounding out the code is a series of new tests.
>
> diff --git a/mercurial/debugcommands.py b/mercurial/debugcommands.py
> --- a/mercurial/debugcommands.py
> +++ b/mercurial/debugcommands.py
> @@ -34,6 +34,7 @@ from . import (
>      localrepo,
>      lock as lockmod,
>      pycompat,
> +    repair,
>      revlog,
>      scmutil,
>      setdiscovery,
> @@ -873,4 +874,4 @@ def debugupgraderepo(ui, repo, run=False
>      should complete almost instantaneously and the chances of a consumer being
>      unable to access the repository should be low.
>      """
> -    raise error.Abort(_('not yet implemented'))
> +    return repair.upgraderepo(ui, repo, run=run, optimize=optimize)
> diff --git a/mercurial/repair.py b/mercurial/repair.py
> --- a/mercurial/repair.py
> +++ b/mercurial/repair.py
> @@ -360,3 +360,417 @@ def deleteobsmarkers(obsstore, indices):
>          newobsstorefile.write(bytes)
>      newobsstorefile.close()
>      return n

[...]

> +def upgradefindimprovements(repo):
> +    """Determine improvements that can be made to the repo during upgrade.
> +
> +    Returns a list of dicts describing repository deficiencies and
> +    optimizations. Each dict value has the following keys:

Can we use namedtuples instead for this?

> +
> +    name
> +       Machine-readable string uniquely identifying this improvement. It
> +       will be mapped to an action later in the upgrade process.
> +
> +    type
> +       Either ``deficiency`` or ``optimization``. A deficiency is an obvious
> +       problem. An ``optimization`` is an action (sometimes optional) that
> +       can be taken to further improve the state of the repository.

Some kind of enum constant setup for this instead of magic strings please. I'd be very happy with

TYPE_DEFICIENCY and TYPE_OPTIMIZATION for now.

>
> +    description
> +       Message intended for humans explaining the improvement in more detail,
> +       including the implications of it. For ``deficiency`` types, should be
> +       worded in the present tense. For ``optimization`` types, should be
> +       worded in the future tense.
> +
> +    upgrademessage
> +       Message intended for humans explaining what an upgrade addressing this
> +       issue will do. Should be worded in the future tense.
> +
> +    fromdefault (``deficiency`` types only)
> +       Boolean indicating whether the current (deficient) state deviates
> +       from Mercurial's default configuration.
> +
> +    fromconfig (``deficiency`` types only)
> +       Boolean indicating whether the current (deficient) state deviates
> +       from the current Mercurial configuration.
> +    """

[...]

> +
> +def upgradedetermineactions(repo, improvements, sourcereqs, destreqs,
> +                            optimize):
> +    """Determine upgrade actions that will be performed.
> +
> +    Given a list of improvements as returned by ``upgradefindimprovements``,
> +    determine the list of upgrade actions that will be performed.
> +
> +    The role of this function is to filter improvements if needed, apply
> +    recommended optimizations from the improvements list that make sense,
> +    etc.
> +
> +    Returns a list of action names.
> +    """
> +    newactions = []
> +
> +    knownreqs = upgradesupporteddestrequirements(repo)
> +
> +    for i in improvements:
> +        name = i['name']
> +
> +        # If the action is a requirement that doesn't show up in the
> +        # destination requirements, prune the action.
> +        if name in knownreqs and name not in destreqs:
> +            continue
> +
> +        if i['type'] == 'deficiency':
> +            newactions.append(name)
> +
> +    newactions.extend(o for o in sorted(optimize) if o not in newactions)
> +
> +    # FUTURE consider adding some optimizations here for certain transitions.
> +    # e.g. adding generaldelta could schedule parent redeltas.

Perhaps add a 'suggests' field to your namedtuple that has a list of
other transitions that may as well be done at the same time. Good idea for later.

> +
> +    return newactions
> +
> +def upgraderepo(ui, repo, run=False, optimize=None):
> +    """Upgrade a repository in place."""
> +    # Avoid cycle: cmdutil -> repair -> localrepo -> cmdutil
> +    from . import localrepo
> +
> +    optimize = set(optimize or [])
> +
> +    repo = repo.unfiltered()
> +
> +    # Ensure the repository can be upgraded.
> +    missingreqs = upgraderequiredsourcerequirements(repo) - repo.requirements
> +    if missingreqs:
> +        raise error.Abort(_('cannot upgrade repository; requirement '
> +                            'missing: %s') % _(', ').join(sorted(missingreqs)))
> +
> +    blockedreqs = upgradeblocksourcerequirements(repo) & repo.requirements

It weirds me out a little th have this be a blacklist, the more I get
into this, but I can't put my finger on a specific case that would be
problematic. Yet, anyway.

> +    if blockedreqs:
> +        raise error.Abort(_('cannot upgrade repository; unsupported source '
> +                            'requirement: %s') %
> +                          _(', ').join(sorted(blockedreqs)))
> +
> +    # FUTURE there is potentially a need to control the wanted requirements via
> +    # command arguments or via an extension hook point.
> +    newreqs = localrepo.newreporequirements(repo)
> +
> +    noremovereqs = (repo.requirements - newreqs -
> +                   upgradesupportremovedrequirements(repo))
> +    if noremovereqs:
> +        raise error.Abort(_('cannot upgrade repository; requirement would be '
> +                            'removed: %s') % _(', ').join(sorted(noremovereqs)))
> +
> +    noaddreqs = (newreqs - repo.requirements -
> +                 upgradeallowednewrequirements(repo))
> +    if noaddreqs:
> +        raise error.Abort(_('cannot upgrade repository; do not support adding '
> +                            'requirement: %s') %
> +                          _(', ').join(sorted(noaddreqs)))
> +
> +    unsupportedreqs = newreqs - upgradesupporteddestrequirements(repo)
> +    if unsupportedreqs:
> +        raise error.Abort(_('cannot upgrade repository; do not support '
> +                            'destination requirement: %s') %
> +                          _(', ').join(sorted(unsupportedreqs)))
> +
> +    # Find and validate all improvements that can be made.
> +    improvements = upgradefindimprovements(repo)
> +    for i in improvements:
> +        if i['type'] not in ('deficiency', 'optimization'):
> +            raise error.Abort(_('unexpected improvement type %s for %s') % (
> +                i['type'], i['name']))
> +
> +    # Validate arguments.
> +    unknownoptimize = optimize - set(i['name'] for i in improvements
> +                                     if i['type'] == 'optimization')
> +    if unknownoptimize:
> +        raise error.Abort(_('unknown optimization action requested: %s') %
> +                          ', '.join(sorted(unknownoptimize)),
> +                          hint=_('run without arguments to see valid '
> +                                 'optimizations'))
> +
> +    actions = upgradedetermineactions(repo, improvements, repo.requirements,
> +                                      newreqs, optimize)
> +
> +    def printrequirements():
> +        ui.write(_('requirements\n'))
> +        ui.write(_('   preserved: %s\n') %
> +                 _(', ').join(sorted(newreqs & repo.requirements)))
> +
> +        if repo.requirements - newreqs:
> +            ui.write(_('   removed: %s\n') %
> +                     _(', ').join(sorted(repo.requirements - newreqs)))
> +
> +        if newreqs - repo.requirements:
> +            ui.write(_('   added: %s\n') %
> +                     _(', ').join(sorted(newreqs - repo.requirements)))
> +
> +        ui.write('\n')
> +
> +    def printupgradeactions():
> +        for action in actions:
> +            for i in improvements:
> +                if i['name'] == action:
> +                    ui.write('%s\n   %s\n\n' %
> +                             (i['name'], i['upgrademessage']))
> +
> +    if not run:
> +        fromdefault = []
> +        fromconfig = []
> +        optimizations = []
> +
> +        for i in improvements:
> +            assert i['type'] in ('deficiency', 'optimization')
> +            if i['type'] == 'deficiency':
> +                if i['fromdefault']:
> +                    fromdefault.append(i)
> +                if i['fromconfig']:
> +                    fromconfig.append(i)
> +            else:
> +                optimizations.append(i)
> +
> +        if fromdefault or fromconfig:
> +            fromconfignames = set(x['name'] for x in fromconfig)
> +            onlydefault = [i for i in fromdefault
> +                           if i['name'] not in fromconfignames]
> +
> +            if fromconfig:
> +                ui.write(_('repository lacks features recommended by '
> +                           'current config options:\n\n'))
> +                for i in fromconfig:
> +                    ui.write('%s\n   %s\n\n' % (i['name'], i['description']))
> +
> +            if onlydefault:
> +                ui.write(_('repository lacks features used by the default '
> +                           'config options:\n\n'))
> +                for i in onlydefault:
> +                    ui.write('%s\n   %s\n\n' % (i['name'], i['description']))
> +
> +            ui.write('\n')
> +        else:
> +            ui.write(_('(no feature deficiencies found in existing '
> +                       'repository)\n'))
> +
> +        ui.write(_('performing an upgrade with "--run" will make the following '
> +                   'changes:\n\n'))
> +
> +        printrequirements()
> +        printupgradeactions()
> +
> +        unusedoptimize = [i for i in improvements
> +                          if i['name'] not in actions and
> +                             i['type'] == 'optimization']
> +        if unusedoptimize:
> +            ui.write(_('additional optimizations are available by specifying '
> +                     '"--optimize <name>":\n\n'))
> +            for i in unusedoptimize:
> +                ui.write(_('%s\n   %s\n\n') % (i['name'], i['description']))
> +
> +        return
> diff --git a/tests/test-upgrade-repo.t b/tests/test-upgrade-repo.t
> --- a/tests/test-upgrade-repo.t
> +++ b/tests/test-upgrade-repo.t
> @@ -1,5 +1,181 @@
> +  $ cat >> $HGRCPATH << EOF
> +  > [extensions]
> +  > share =
> +  > EOF
> +
> +store and revlogv1 are required in source
> +
> +  $ hg --config format.usestore=false init no-store
> +  $ hg -R no-store debugupgraderepo
> +  abort: cannot upgrade repository; requirement missing: store
> +  [255]
> +
> +  $ hg init no-revlogv1
> +  $ cat > no-revlogv1/.hg/requires << EOF
> +  > dotencode
> +  > fncache
> +  > generaldelta
> +  > store
> +  > EOF
> +
> +  $ hg -R no-revlogv1 debugupgraderepo
> +  abort: cannot upgrade repository; requirement missing: revlogv1
> +  [255]
> +
> +Cannot upgrade shared repositories
> +
> +  $ hg init share-parent
> +  $ hg -q share share-parent share-child
> +
> +  $ hg -R share-child debugupgraderepo
> +  abort: cannot upgrade repository; unsupported source requirement: shared
> +  [255]
> +
> +Do not yet support upgrading manifestv2 and treemanifest repos
> +
> +  $ hg --config experimental.manifestv2=true init manifestv2
> +  $ hg -R manifestv2 debugupgraderepo
> +  abort: cannot upgrade repository; unsupported source requirement: manifestv2
> +  [255]
> +
> +  $ hg --config experimental.treemanifest=true init treemanifest
> +  $ hg -R treemanifest debugupgraderepo
> +  abort: cannot upgrade repository; unsupported source requirement: treemanifest
> +  [255]
> +
> +Cannot add manifestv2 or treemanifest requirement during upgrade
> +
> +  $ hg init disallowaddedreq
> +  $ hg -R disallowaddedreq --config experimental.manifestv2=true --config experimental.treemanifest=true debugupgraderepo
> +  abort: cannot upgrade repository; do not support adding requirement: manifestv2, treemanifest
> +  [255]
> +
> +An upgrade of a repository created with recommended settings only suggests optimizations
> +
>    $ hg init empty
>    $ cd empty
>    $ hg debugupgraderepo
> -  abort: not yet implemented
> -  [255]
> +  (no feature deficiencies found in existing repository)
> +  performing an upgrade with "--run" will make the following changes:
> +
> +  requirements
> +     preserved: dotencode, fncache, generaldelta, revlogv1, store
> +
> +  additional optimizations are available by specifying "--optimize <name>":
> +
> +  redeltaparent
> +     deltas within internal storage will be recalculated to choose an optimal base revision where this was not already done; the size of the repository may shrink and various operations may become faster; the first time this optimization is performed could slow down upgrade execution considerably; subsequent invocations should not run noticeably slower
> +
> +  redeltamultibase
> +     deltas within internal storage will be recalculated against multiple base revision and the smallest difference will be used; the size of the repository may shrink significantly when there are many merges; this optimization will slow down execution in proportion to the number of merges in the repository and the amount of files in the repository; this slow down should not be significant unless there are tens of thousands of files and thousands of merges
> +
> +  redeltaall
> +     deltas within internal storage will always be recalculated without reusing prior deltas; this will likely make execution run several times slower; this optimization is typically not needed
> +
> +
> +--optimize can be used to add optimizations
> +
> +  $ hg debugupgrade --optimize redeltaparent
> +  (no feature deficiencies found in existing repository)
> +  performing an upgrade with "--run" will make the following changes:
> +
> +  requirements
> +     preserved: dotencode, fncache, generaldelta, revlogv1, store
> +
> +  redeltaparent
> +     deltas within internal storage will choose a new base revision if needed
> +
> +  additional optimizations are available by specifying "--optimize <name>":
> +
> +  redeltamultibase
> +     deltas within internal storage will be recalculated against multiple base revision and the smallest difference will be used; the size of the repository may shrink significantly when there are many merges; this optimization will slow down execution in proportion to the number of merges in the repository and the amount of files in the repository; this slow down should not be significant unless there are tens of thousands of files and thousands of merges
> +
> +  redeltaall
> +     deltas within internal storage will always be recalculated without reusing prior deltas; this will likely make execution run several times slower; this optimization is typically not needed
> +
> +
> +Various sub-optimal detections work
> +
> +  $ cat > .hg/requires << EOF
> +  > revlogv1
> +  > store
> +  > EOF
> +
> +  $ hg debugupgraderepo
> +  repository lacks features recommended by current config options:
> +
> +  fncache
> +     long and reserved filenames may not work correctly; repository performance is sub-optimal
> +
> +  dotencode
> +     storage of filenames beginning with a period or space may not work correctly
> +
> +  generaldelta
> +     deltas within internal storage are unable to choose optimal revisions; repository is larger and slower than it could be; interaction with other repositories may require extra network and CPU resources, making "hg push" and "hg pull" slower
> +
> +
> +  performing an upgrade with "--run" will make the following changes:
> +
> +  requirements
> +     preserved: revlogv1, store
> +     added: dotencode, fncache, generaldelta
> +
> +  fncache
> +     repository will be more resilient to storing certain paths and performance of certain operations should be improved
> +
> +  dotencode
> +     repository will be better able to store files beginning with a space or period
> +
> +  generaldelta
> +     repository storage will be able to create optimal deltas; new repository data will be smaller and read times should decrease; interacting with other repositories using this storage model should require less network and CPU resources, making "hg push" and "hg pull" faster
> +
> +  additional optimizations are available by specifying "--optimize <name>":
> +
> +  redeltaparent
> +     deltas within internal storage will be recalculated to choose an optimal base revision where this was not already done; the size of the repository may shrink and various operations may become faster; the first time this optimization is performed could slow down upgrade execution considerably; subsequent invocations should not run noticeably slower
> +
> +  redeltamultibase
> +     deltas within internal storage will be recalculated against multiple base revision and the smallest difference will be used; the size of the repository may shrink significantly when there are many merges; this optimization will slow down execution in proportion to the number of merges in the repository and the amount of files in the repository; this slow down should not be significant unless there are tens of thousands of files and thousands of merges
> +
> +  redeltaall
> +     deltas within internal storage will always be recalculated without reusing prior deltas; this will likely make execution run several times slower; this optimization is typically not needed
> +
> +
> +  $ hg --config format.dotencode=false debugupgraderepo
> +  repository lacks features recommended by current config options:
> +
> +  fncache
> +     long and reserved filenames may not work correctly; repository performance is sub-optimal
> +
> +  generaldelta
> +     deltas within internal storage are unable to choose optimal revisions; repository is larger and slower than it could be; interaction with other repositories may require extra network and CPU resources, making "hg push" and "hg pull" slower
> +
> +  repository lacks features used by the default config options:
> +
> +  dotencode
> +     storage of filenames beginning with a period or space may not work correctly
> +
> +
> +  performing an upgrade with "--run" will make the following changes:
> +
> +  requirements
> +     preserved: revlogv1, store
> +     added: fncache, generaldelta
> +
> +  fncache
> +     repository will be more resilient to storing certain paths and performance of certain operations should be improved
> +
> +  generaldelta
> +     repository storage will be able to create optimal deltas; new repository data will be smaller and read times should decrease; interacting with other repositories using this storage model should require less network and CPU resources, making "hg push" and "hg pull" faster
> +
> +  additional optimizations are available by specifying "--optimize <name>":
> +
> +  redeltaparent
> +     deltas within internal storage will be recalculated to choose an optimal base revision where this was not already done; the size of the repository may shrink and various operations may become faster; the first time this optimization is performed could slow down upgrade execution considerably; subsequent invocations should not run noticeably slower
> +
> +  redeltamultibase
> +     deltas within internal storage will be recalculated against multiple base revision and the smallest difference will be used; the size of the repository may shrink significantly when there are many merges; this optimization will slow down execution in proportion to the number of merges in the repository and the amount of files in the repository; this slow down should not be significant unless there are tens of thousands of files and thousands of merges
> +
> +  redeltaall
> +     deltas within internal storage will always be recalculated without reusing prior deltas; this will likely make execution run several times slower; this optimization is typically not needed
> +
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list