Note:

This page is primarily intended for developers of Mercurial.

Path Conflicts

Status: Early Proposal

Main proponents: MarkThomas

/!\ This is a speculative project and does not represent any firm decisions on future behavior.

1. The Problem

Path conflicts occur during merges when one side of the merge has a file or link that has the same path name as a directory containing one or more files or links on the other side. These can occur in either direction, the local repository might have a file that is a directory on the remote, or vice versa. In general, we say that the path conflict occurs at the shortest path, i.e. a file on one side conflicts with a directory containing one or more files on the other side.

Currently Mercurial deals only with files, and does not handle path conflicts at all. This can lead to serious bugs, particularly when the conflict occurs at a symlink to a directory. The problems occur in two places in particular: updating and merging.

An example bug that is caused by this: issue5628

2. Current Behaviour

In this document, “local” means the target revision of a merge, or the working copy in the case of update, and “remote” means the source revision. These are the terms used in the Mercurial merge module.

2.1. Updating

When a local file or symlink conflicts with a remote file or symlink:

When a local file conflicts with a remote directory:

When a local directory conflicts with a remote file or symlink:

For these path conflicts Mercurial should have the same behaviour as file conflicts.

2.2. Merging

This applies for all kinds of merging, including rebasing and grafting.

When a local file or symlink conflicts with a remote file or symlink:

When a local file conflicts with a remote directory:

When a local symlink conflicts with a remote directory:

When a local directory conflicts with a remote file or symlink:

For these path conflicts, Mercurial should instead prompt the user to merge the file and directory, or leave the the working copy in a merge conflict state that can be resolved.

The goal of this plan is to fix all the bugs listed above.

3. The Solution

We introduce path conflicts as a new kind of conflict that can be detected during update and must be handled during merge. These are manifest-level conflicts, and so are detected during manifest merge.

During update, we also take care to check for conflicting directories as well as conflicting paths, and handle them accordingly.

3.1. Example

Consider the following scenario.

Files in the local revision:

a/b

Files in the remote revision:

a/b/c1
a/b/c2
a/b/d/e

When merging from the remote revision to the local revision we expect to be informed of a conflict at a/b. The working copy is left in the following state:

a/b~localhash
a/b/c1
a/b/c2
a/b/d/e

The file is renamed to include the hash of the local revision. If the local revision is a modified tracked file in the working copy, e.g. when updating to a new commit with changed but uncommitted files, then the local hash is the hash of the commit with a + appended (as is the same for other descriptions of a modified working copy). If there is already a tracked file of that name, then it is additionally suffixed with ~N, where N is the smallest number for which a tracked file does not already exist.

It is always the file that is renamed. Since Mercurial tracks files rather than directories, renaming the directory would involve propagating the rename to all the files inside that directory, which would create lots of noise and make it harder to keep track of the copies, particularly if the user renames the directory back to the original name.

The path conflict must be resolved by deleting or explicitly adding the renamed file, or by renaming it or the directory to a more suitable name. Once resolved, the user must run hg resolve --mark on the original conflicting path.

The situation is similar for the case where a remote file conflicts with a local directory, except that the file is renamed to include the remote revision hash.

This behaviour is similar to git. Git will use branch names in preference to commit hashes, however for simplicity we will only use hashes.

4. Implementation Details

4.1. Merging Manifests

In merge.manifestmerge we need to add a new merge action representing creation of a path conflict. When a file exists only on one side, we check if it has any path conflicts (i.e. a directory with the same name, or a file that matches any of its path prefixes), and if so, create a new merge action of the form:

('p', (renamedfilename, origin), "path conflict")

The conflict is always listed against the shortest path, i.e. the path that is a file on one side and a directory on the other. The renamedfilename parameter is the safe name that was created by appending a commit hash, and the origin parameter is 'l' for a local path conflict (i.e., one where the file was on the local side) and 'r' for a remote path conflict.

We also create a merge action for performing the rename to the safe name. For conflicts where the remote file is the one that conflicts, we can re-use the 'dg' action to perform a renamed get action. For conflicts where it is the local file that conflicts, we introduce a new merge action representing a rename for path conflict resolution. This takes the form:

('pr', (oldfilename,), "local path conflict")

We always leave the repository in the conflicted state, and it is up to the user to resolve the conflict by deleting or renaming files, and then marking the path conflict as resolved by running 'hg resolve --mark' on the conflicting path.

4.2. Merge Conflict State

The merge.mergestate class is extended with a new record type:

P: a path conflict to be resolved

Merge state records use capital letters to signify that versions of Mercurial that do not understand the merge record must abort and refuse the process the merge state, and lower case letters to signify advisory merge state records that are safe to ignore. This record uses a capital letter so that old versions of Mercurial will refuse to process it.

The record has a sub-type of either 'pu' or 'pr' depending on whether or not it has been marked as resolved. These are analogous to the 'u' and 'r' sub-types of normal file conflicts. The record also contains the path that conflicts, and the path that the renamed file was renamed to, as well as whether it was a local or remote file that was renamed. The path conflict is deemed resolved when the user runs hg resolve --mark on the original conflicting path.

4.3. Updating the Working Copy

In merge.applyupdates, path conflicts for local files are dealt with by renaming the file, adding the original commit hash as a suffix. This is the new 'pr' action.

In merge._checkunknownfile, we additionally check the following:

5. Concerns

5.1. Checking dirs in manifests

In order to determine whether a file conflicts with directories in the other manifest, we must query the other manifest to see if it contains a particular directory. We can do that with othermanifest.hasdir(dirname), however for flat manifests this works by building a util.dirs object, which may be expensive to build for repos with large manifests.

It should be possible to improve the implementation of manifest.hasdir() to be more efficient for flat manifests by binary searching for any file that begins with dirname + sep.

Tree manifest already has an efficient implementation that simply looks for the tree manifest directory node.

5.2. Check files within dirs in manifests

Similarly, when applying merge actions to update from a revision where a path is a directory to one where that path is a file, we must make sure that the set of actions has deleted all of the files in the directory. This involves checking all files in a directory, which currently requires iterating over the whole manifest. For large repos, this is slow.


CategoryDeveloper CategoryNewFeatures

PathConflictsPlan (last edited 2017-09-20 15:50:55 by MarkThomas)