Consequences for use of hg for other applications than SCM was Re: German umlauts in file names
Hans Meine
meine at informatik.uni-hamburg.de
Mon Jun 23 08:15:47 CDT 2008
Am Montag, 23. Juni 2008 14:38:15 schrieb Alexander Belchenko:
> Windows filesystem uses unicode for saving filenames.
> So if svn properly using unicode Win32 API there is absolutely no problems.
As Matt wrote, hg does *not* use the unicode API (which is also available in
Python, see the link I posted above), but uses only 8-bit functions. This
way, unicode filenames cannot be preserved. IMO this qualifies as a bug -
OK, call it a documented, clean, but for certain users unexpected (and
undesired) behavior which cannot be changed.
However, I think this should not be hard to fix for people like Marko. Since
backwards compatibility is definitely important, and people like Matt would
probably be opposed to an unconditional switch to unicode (which would lead
to problems for other people, as expressed in this thread), I think the
easiest change would be to have a "unicode filename" switch that would treat
all filenames in the repo as being UTF-8 encoded. This way, the repo format
stays the same (i.e. only 8-bit filenames are used), but whenever a filename
is "applied to" or "fetched from" the local filesystem, it would need to be
reencoded if sys.getfilesystemencoding() != "utf-8".
IMHO the case that the local filesystem encoding is incompatible with a
filename from the repo could simply throw an error - that would only occur if
someone explicitly told Mercurial to convert filenames when this is
impossible. (The user could then clone the repo without converting the
filenames, assuming the other tools in question can deal with UTF-8
filenames, or change the filesystem / update the OS in case they can't.)
--
Ciao, / /
/--/
/ / ANS
More information about the Mercurial
mailing list