FixUtf8 Extension

This extension is not distributed with Mercurial.

Author: Stefan Rusek

Repository: https://bitbucket.org/stefanrusek/hg-fixutf8/

Compatibility: requires Mercurial 1.1 or later and Python 2.5 or later

This extension is still in beta, use it at your own risk.

1. Overview

This extension corrects filename encoding problems on Windows.

Windows internally stores all command line arguments and filenames in Unicode UTF-16 (16-bit character strings), and for backward compatibility with Windows 3.x, provides functions to retrieve them in non-Unicode 8-bit character strings. Python 2.x and Mercurial call the non-Unicode functions. This causes Mercurial to misbehave when used with filenames that contain Unicode characters. This extension resolves this issue, by making sure that the Unicode functions are called. Since Mercurial expects 8-bit character strings, the extension converts the strings to UTF-8 before returning them to Mercurial.

There is one case where FixUtf8 fails to add support for Unicode, because the repository object for the current working directory is created before extensions are loaded. There is nothing that FixUtf8 can do to fix the problem of a repository residing within a directory with Unicode characters in it. However, FixUtf8 does not have a problem with directories with Unicode characters inside of the repository.

Ideally, you enable the extension before you need international filenames zetaclear, but if you already have international filenames in your repo, then you need to fix your filenames.

In order for Unicode characters to display properly, you should change the Windows console font from "Raster Fonts" to "Lucida Console".

2. Fixing existing filenames

To fix your filenames simply do the following:

>hg addremove -s 100
>hg commit -m "Fix filenames"

3. Configuration

Configure your .hgrc to enable the extension by adding following lines:

[extensions]
fixutf8 = path/to/fixutf8.py


CategoryExtensionsByOthers