CaseFolding - Mercurial

Note:

This page is primarily intended for developers of Mercurial.

Note:

This page is no longer relevant but is kept for historical purposes.

This page is to discuss the problem of interoperating between case-sensitive and case-insensitive filesystems.

$/!\$ The state of this page is unknown and it is mostly outdated. Mercurial now by default uses a storage format where case collisions in the history isn't a problem on windows and where updating to a problematic revisions fails "nicely".

The Problem

If a repository contains history for "A" and then pulls a changeset containing "a", case-insensitive file systems will see this as a collision. With the result (in 1.1.2) that:

The user will be unable to create or update a working copy.
The user successfully merge and commit a merge which prevents future merges, because on a case-sensitive file system, Mercurial fails if the second parent has a case collision. If that merge is not backed out, a case-insensitive file system will have multiple unmergable heads without delving into commands that modify history (see FixingCaseCollisions if you need help doing this right now).

Another easy fix (easy if you have Linux access), is to clone the repository to a case-sensitive file system, do the merge, commit, then pull the merge back to the Windows machine.

Notice: can do all this on a Windows machine. You don't need multiple OS's, but it sure helps:

Most Linux & UNIX hard disks have a case-sensitive file system, but...
Linux & Unix users routinely encounter case-insensitive filesystems when they use flash drives or SMB shares.
All Windows filesystems are case-insensitive, but...
Windows users routinely encounter case-sensitive file systems when they collaborate on cross-platform projects or transfer web content to a server.
The OS X HFS+ filesystem is case-insensitive by default, but can be made case-sensitive...and can therefore experience all the exceptions mentioned above.

Proposals

See CaseFoldingPlan for the current plan.
Precommit Hook.
- The CasestopExtension works to prevent committing a potential filename collision, but it has to be enabled individually in the hgrc of every repository, user, or machine.
Filename Canonization Hook
- File names should be encoded/decoded in hooks the same way as file contents can be now. For case insensitive filesystems the filename.caseinsensitive hook (or however it gets called) will be enabled by default, but you can add this to your hgrc if you want to behave the same way on a different system. This depends on extending the hook mechanism to call predefined functions, which may be generally useful for things like expanding RCS keyword strings, too.
FixcaseExtension
- This extension works to coerce the case of files to match the manifest. This is a staight and simple approach that will suffice in many environments. Not sure how it handles collaborating users separately adding a file whose name case collides.
CaseFoldPlugin
- The CaseFoldPlugin proposal attempts to customize the handling of case collisions.

A Little Deeper

Early filesystems weren't case-insensitive (CI); they were case-coercing. Many machines only displayed and input upper case (early Apples, MS-DOS, FORTH machines, LISP machines). So it only made sense that their directories stored filenames in uppercase.

Then along came the shift key and displays that could display lowercase. Mixed-case began to pop up in source code and filenames. Filesystems and computer languages both had to make the choice: is case just a visual difference or a semantic one?

The decision is a fundamental one, and like most fundamental chioces it makes sense in its own universe. But the distinction is painful when universes overlap. If you've ever ported code across to or from a case-sensitive (CS) language, you know the pain. One direction, you get variable overlap: an algorithm that (sloppily?) distinguished "Tree" and "tree" has problems. In the other direction, inconsistent case can make variables that aren't found.

Same thing happens when your Windows software decides to change ".jpg" to ".JPG". You sync your files to a Linux server and wonder why your photos disappear. I use "Hawaii.jpg" and "hawaii.jpg" to distinguish photo from thumbnail...and my buddy can't check out a working copy on his Windows box.

Point is, CI vs. CS is a fundamental difference: do two names differing only by case refer to the same thing or two different things. In one case, you have a conflict to be merged; in the other, you just have an additional file.

And you can't know which choice is correct simply by looking at the current file system. You have to know things like, "Will people need working copies of this project on a CI filesystem? Will the files be used by CS applications? Will they be modified by case-careless applications?

(As noted in CaseFoldingPlan, case folding is also part of the larger problem of name collision which can also result from dot-notation and Unicode representation.)

Conclusions

This isn't ideal, but wishes aside, we're going to have to deal with it for awhile.
As a file store, Mercurial is case-sensitive.
...but it routinely deals with filesystems that aren't (part of its popularity?)
...and worst (best?) of all, by its very nature, it fosters interaction between filesystems that deal with case differently.

(Prediction: I predict that distribute version control will be the application that entices all desktop filesystems to become case-sensitive. Could be wrong. But you heard it here first, folks.)

CategoryWindows CategoryDeveloper