Differences between revisions 3 and 4
Revision 3 as of 2017-08-24 15:48:22
Size: 9928
Editor: MarkThomas
Comment: Clarify reason for renaming the file in both conflict cases.
Revision 4 as of 2017-09-20 15:50:55
Size: 10271
Editor: MarkThomas
Comment: Updated based on plan changes made during development
Deletions are marked like this. Additions are marked like this.
Line 101: Line 101:
The file is renamed to include the hash of the local revision. If the local revision is a modified tracked file in the working copy, e.g. when updating to a new commit with changed but uncommitted files, then the file is renamed to `a/b~working-copy`. If there is already a file of that name, then it is additionally suffixed with `~N`, where `N` is the smallest number for which a file does not already exist. The file is renamed to include the hash of the local revision. If the local revision is a modified tracked file in the working copy, e.g. when updating to a new commit with changed but uncommitted files, then the local hash is the hash of the commit with a `+` appended (as is the same for other descriptions of a modified working copy). If there is already a tracked file of that name, then it is additionally suffixed with `~N`, where `N` is the smallest number for which a tracked file does not already exist.
Line 115: Line 115:
In `merge.manifestmerge` we need to add a new merge action representing the resolution of a path conflict. When a file exists only on one side, we check if it has any path conflicts (i.e. a directory with the same name, or a file that matches any of its path prefixes), and if so, create a new merge action of the form: In `merge.manifestmerge` we need to add a new merge action representing creation of a path conflict. When a file exists only on one side, we check if it has any path conflicts (i.e. a directory with the same name, or a file that matches any of its path prefixes), and if so, create a new merge action of the form:
Line 118: Line 118:
('p', (localpath, remotepath, resolved), path conflict) ('p', (renamedfilename, origin), "path conflict")
Line 121: Line 121:
The conflict is always listed against the shortest path, i.e. the path that is a file on one side and a directory on the other. The `localpath` and `remotepath` parameters are the path names that the files or directories are renamed to as part of the resolution. The resolved flag is initially `False`. The conflict is always listed against the shortest path, i.e. the path that is a file on one side and a directory on the other. The renamedfilename parameter is the safe name that was created by appending a commit hash, and the origin parameter is 'l' for a local path conflict (i.e., one where the file was on the local side) and 'r' for a remote path conflict.
Line 123: Line 123:
In `merge.update`, we prompt the user for any unresolved path conflicts. The possible options are: We also create a merge action for performing the rename to the safe name. For conflicts where the remote file is the one that conflicts, we can re-use the `'dg'` action to perform a renamed get action. For conflicts where it is the local file that conflicts, we introduce a new merge action representing a rename for path conflict resolution. This takes the form:
Line 125: Line 125:
 * Use local file/directory and ignore the remote directory/file. The action is removed. All remote files inside the directory are also removed from the set of actions.
 * Use remote file/directory and ignore the local directory/file. The action is removed. New `'r'` actions are added for all the local files in the path. A new `'g'` action is added for the remote if it is a file.
 * Resolve by renaming the file. The action becomes `('p', (localpath, remotepath, True), “path rename”)`, where the renamed path is updated.
 * Resolve by renaming the remote directory. The local action is replaced with a `'k'` action. The `'g'` actions for all files in the remote directory are updated with their new paths.
 * Resolve by renaming the local directory. (TBD how the actions work here; I think it works like renaming all the local files, and then adding a `'g'` action for the remote file).
 * Defer. The action is left as-is. The merge can be re-attempted using `hg resolve`.
{{{
('pr', (oldfilename,), "local path conflict")
}}}

We always leave the repository in the conflicted state, and it is up to the user to resolve the conflict by deleting or renaming files, and then marking the path conflict as resolved by running `'hg resolve --mark'` on the conflicting path.
Line 142: Line 141:
The record contains the path that conflicts, and the path that the renamed file was renamed to. The path conflict is deemed resolved when the user runs `hg resolve --mark` on the original conflicting path. The record has a sub-type of either 'pu' or 'pr' depending on whether or not it has been marked as resolved. These are analogous to the 'u' and 'r' sub-types of normal file conflicts. The record also contains the path that conflicts, and the path that the renamed file was renamed to, as well as whether it was a local or remote file that was renamed. The path conflict is deemed resolved when the user runs `hg resolve --mark` on the original conflicting path.
Line 146: Line 145:
In `merge.applyupdates`, path conflicts are dealt with by renaming the file, adding the original commit hash as a suffix. If resolved is false, the path conflict is then added to the merge conflict state. In `merge.applyupdates`, path conflicts for local files are dealt with by renaming the file, adding the original commit hash as a suffix. This is the new `'pr'` action.
Line 163: Line 162:
=== Check files within dirs in manifests ===

Similarly, when applying merge actions to update from a revision where a path is a directory to one where that path is a file, we must make sure that the set of actions has deleted all of the files in the directory. This involves checking all files in a directory, which currently requires iterating over the whole manifest. For large repos, this is slow.

Note:

This page is primarily intended for developers of Mercurial.

Path Conflicts

Status: Early Proposal

Main proponents: MarkThomas

/!\ This is a speculative project and does not represent any firm decisions on future behavior.

1. The Problem

Path conflicts occur during merges when one side of the merge has a file or link that has the same path name as a directory containing one or more files or links on the other side. These can occur in either direction, the local repository might have a file that is a directory on the remote, or vice versa. In general, we say that the path conflict occurs at the shortest path, i.e. a file on one side conflicts with a directory containing one or more files on the other side.

Currently Mercurial deals only with files, and does not handle path conflicts at all. This can lead to serious bugs, particularly when the conflict occurs at a symlink to a directory. The problems occur in two places in particular: updating and merging.

An example bug that is caused by this: issue5628

2. Current Behaviour

In this document, “local” means the target revision of a merge, or the working copy in the case of update, and “remote” means the source revision. These are the terms used in the Mercurial merge module.

2.1. Updating

When a local file or symlink conflicts with a remote file or symlink:

  • If the local file is unknown or ignored, we follow the merge.checkunknown and merge.checkignored config options and either abort, warn, or proceed with updating to the new file.

  • If the local file is tracked but not committed, the update aborts with a merge conflict.

When a local file conflicts with a remote directory:

  • In all cases we abort with “Not a directory”, and the working copy is left in an unfinished update state. This is a bug.

When a local directory conflicts with a remote file or symlink:

  • If the local directory is empty, it is deleted and replaced with the remote file or symlink.
  • If the local directory is not empty, we abort with “Non-empty directory” and the working copy is left in an unfinished update state. This is a bug.

For these path conflicts Mercurial should have the same behaviour as file conflicts.

2.2. Merging

This applies for all kinds of merging, including rebasing and grafting.

When a local file or symlink conflicts with a remote file or symlink:

  • The user is prompted to merge the files. In the case of a conflict involving a symlink, the user is prompted to select whether they want to use the local or remote file or symlink.
  • If the merge fails, the working copy is left in the merge conflict state. Fixing the merge conflict allows the merge to be completed.

When a local file conflicts with a remote directory:

  • The merge aborts with “Not a directory”, and the working copy is left in an unfinished update state that must be aborted. The merge cannot be completed. This is a bug.

When a local symlink conflicts with a remote directory:

  • The merge aborts with “path traverses symbolic link”, and the working copy is left in an unfinished update state that must be aborted. The merge cannot be completed. This is a bug.

When a local directory conflicts with a remote file or symlink:

  • The merge aborts with “Non-empty directory”, and the working copy is left in an unfinished update state that must be aborted. The merge cannot be completed. This is a bug.

For these path conflicts, Mercurial should instead prompt the user to merge the file and directory, or leave the the working copy in a merge conflict state that can be resolved.

The goal of this plan is to fix all the bugs listed above.

3. The Solution

We introduce path conflicts as a new kind of conflict that can be detected during update and must be handled during merge. These are manifest-level conflicts, and so are detected during manifest merge.

During update, we also take care to check for conflicting directories as well as conflicting paths, and handle them accordingly.

3.1. Example

Consider the following scenario.

Files in the local revision:

a/b

Files in the remote revision:

a/b/c1
a/b/c2
a/b/d/e

When merging from the remote revision to the local revision we expect to be informed of a conflict at a/b. The working copy is left in the following state:

a/b~localhash
a/b/c1
a/b/c2
a/b/d/e

The file is renamed to include the hash of the local revision. If the local revision is a modified tracked file in the working copy, e.g. when updating to a new commit with changed but uncommitted files, then the local hash is the hash of the commit with a + appended (as is the same for other descriptions of a modified working copy). If there is already a tracked file of that name, then it is additionally suffixed with ~N, where N is the smallest number for which a tracked file does not already exist.

It is always the file that is renamed. Since Mercurial tracks files rather than directories, renaming the directory would involve propagating the rename to all the files inside that directory, which would create lots of noise and make it harder to keep track of the copies, particularly if the user renames the directory back to the original name.

The path conflict must be resolved by deleting or explicitly adding the renamed file, or by renaming it or the directory to a more suitable name. Once resolved, the user must run hg resolve --mark on the original conflicting path.

The situation is similar for the case where a remote file conflicts with a local directory, except that the file is renamed to include the remote revision hash.

This behaviour is similar to git. Git will use branch names in preference to commit hashes, however for simplicity we will only use hashes.

4. Implementation Details

4.1. Merging Manifests

In merge.manifestmerge we need to add a new merge action representing creation of a path conflict. When a file exists only on one side, we check if it has any path conflicts (i.e. a directory with the same name, or a file that matches any of its path prefixes), and if so, create a new merge action of the form:

('p', (renamedfilename, origin), "path conflict")

The conflict is always listed against the shortest path, i.e. the path that is a file on one side and a directory on the other. The renamedfilename parameter is the safe name that was created by appending a commit hash, and the origin parameter is 'l' for a local path conflict (i.e., one where the file was on the local side) and 'r' for a remote path conflict.

We also create a merge action for performing the rename to the safe name. For conflicts where the remote file is the one that conflicts, we can re-use the 'dg' action to perform a renamed get action. For conflicts where it is the local file that conflicts, we introduce a new merge action representing a rename for path conflict resolution. This takes the form:

('pr', (oldfilename,), "local path conflict")

We always leave the repository in the conflicted state, and it is up to the user to resolve the conflict by deleting or renaming files, and then marking the path conflict as resolved by running 'hg resolve --mark' on the conflicting path.

4.2. Merge Conflict State

The merge.mergestate class is extended with a new record type:

P: a path conflict to be resolved

Merge state records use capital letters to signify that versions of Mercurial that do not understand the merge record must abort and refuse the process the merge state, and lower case letters to signify advisory merge state records that are safe to ignore. This record uses a capital letter so that old versions of Mercurial will refuse to process it.

The record has a sub-type of either 'pu' or 'pr' depending on whether or not it has been marked as resolved. These are analogous to the 'u' and 'r' sub-types of normal file conflicts. The record also contains the path that conflicts, and the path that the renamed file was renamed to, as well as whether it was a local or remote file that was renamed. The path conflict is deemed resolved when the user runs hg resolve --mark on the original conflicting path.

4.3. Updating the Working Copy

In merge.applyupdates, path conflicts for local files are dealt with by renaming the file, adding the original commit hash as a suffix. This is the new 'pr' action.

In merge._checkunknownfile, we additionally check the following:

  • If any of the path prefixes of the target file exists as a file or link, we consider this a conflict and abort or warn as appropriate. If we are not aborting then the file is deleted.
  • If the file already exists as a directory, we consider this a conflict and abort or warn as appropriate. If we are not aborting, the directory and all of its contents are deleted.

5. Concerns

5.1. Checking dirs in manifests

In order to determine whether a file conflicts with directories in the other manifest, we must query the other manifest to see if it contains a particular directory. We can do that with othermanifest.hasdir(dirname), however for flat manifests this works by building a util.dirs object, which may be expensive to build for repos with large manifests.

It should be possible to improve the implementation of manifest.hasdir() to be more efficient for flat manifests by binary searching for any file that begins with dirname + sep.

Tree manifest already has an efficient implementation that simply looks for the tree manifest directory node.

5.2. Check files within dirs in manifests

Similarly, when applying merge actions to update from a revision where a path is a directory to one where that path is a file, we must make sure that the set of actions has deleted all of the files in the directory. This involves checking all files in a directory, which currently requires iterating over the whole manifest. For large repos, this is slow.


CategoryDeveloper CategoryNewFeatures

PathConflictsPlan (last edited 2017-09-20 15:50:55 by MarkThomas)