Shared Repositories

Current Situation

Shared repositories are currently implemented using the ShareExtension.

Many other extensions do not work well with the share extension. A big part of this problem is that there is no easy way to designate whether any given file should be shared. Furthermore, even in core, files that could/should be shared are intermixed with files that should not be shared. For example: the bookmarks file (.hg/bookmarks) and the active bookmark file (.hg/bookmarks.current) are in the same directory, but the active bookmark file applies to the working copy state while the bookmarks file is much more about repository state (and indeed, is updated on pull, for example). This problem shows up again and again with a large number of extensions, inside and outside of core, as well as features within core itself including: shelve, history rewriting operations (eg strip, rebase, histedit with the strip-backups directory), remotenames, hgsubversion, etc, etc.

Note that cache files also fall into this problem: many cache files can and should be shared between shared repos, but today none are. Only a handful of caches depend on the working copy parent (eg, visibility of obsolete changesets).

Currently, there are three vfs types:

atomictmp is a parameter to vfs for making a single file atomically written, but it does not make any guarantees about multiple files (that is handled today, rather imperfectly, by transactions)

Problem Statement

In order to solve this problem, we need a generic, safe, and easy way for both core and extensions to designate a file as "shareable". Essentially, there are three categories of shared state:

  1. should not be shared (eg, dirstate)

  2. may be shared optionally (eg, bookmarks)

  3. always shared (eg, store/*)

Atomic transactions also provide a layer of complexity (see AtomicRepositoryLayoutPlan). Some files need to be updated together, atomically. Shared files that need to be updated atomically include: bookmarks when a commit is made; remotenames during a pull, etc. The plan (based on a chat with Pierre-Yves) is to have a file that points to a directory name (or multiple directory names) where the current versions of files are. This file will then be updated atomically (via a tempfile rename). This atomicity will need to work with both shared and unshared files. To be totally correct, shared files will need to be updated in transactions that include unshared files (eg, a pull that updates both local unshared bookmarks and shared remote names).

Locking

Write operations on shared repositories are fundamentally multi-repository operations, so a locking scheme must be made. Today, we lock wlock (which protects /.hg/* except /.hg/store/*) and then lock (which protects /.hg/store/*) (see LockingDesign). Operations on shared repositories will need to take locks on the first repositories, so locking order is important. Since we acquire wlock first, it makes since to next acquire the shared file lock and then the storage lock.

See also issue4858 ("Share extension can cause wlock vs lock deadlock") for detail about possibility of deadlock of current (at least, ver 3.7.3) implementation.

Solution proposals

Solution A

Add a vfs for files that can be shared. Files opened with this vfs will be shared by default. Add a mechanism for excluding files from sharing by configuration (eg, for when bookmarks are not shared).

Over time, we can work on migrating use cases of vfs to this new shared vfs (unfortunately, svfs is already in use; it will need another name). When this transition is done, we will need to make sure to acquire the shared lock at the same time. Most extensions will need to be modified to use this new vfs.

pros: safe, easy first step (just new APIs); gradual change

cons: many more places to update, long tail of work that, realistically, never completes

Solution B

Make the standard vfs share files by default. Make lock acquire both wdir locks by default (and introduce new more granular locks for the future). Opt out files that should not be shared (eg, dirstate, active bookmark, etc).

pros: fixes most share problems immediately; can potentially be implemented by an extension

cons: big backwards compatibility break; unlikely it can be part of the share extension

Solution C

Make mercurial aware of multiple working copies. Instead of shared repos only sharing the state in /.hg/store/ (and optionally /.hg/bookmarks), make everything is in the same repository directory, but each working copy is tied to a particular dirstate in the repo. This can be accomplished by adding a suffix to all dirstate-related files, or in the future by putting all dirstate-related files into a directory per workdir. Then a working copy is just a "thin client" that talks to a repo (that can live anywhere).

pros: provides powerful new functionality to mercurial; works well with extensions; elegantly solves locking issues (all locks are in the single repository)

cons: new unproven idea; deep changes may be required

Dangers of shared repositories

SharedRepository (last edited 2017-01-13 17:28:10 by AntonShestakov)