Rust: the issue of filenames and Python interop

Yuya Nishihara yuya at tcha.org
Tue Aug 27 09:37:05 EDT 2019


On Mon, 26 Aug 2019 18:17:57 +0200, Raphaël Gomès wrote:
> From the perspective of our pure-rust "hg-core" library, we should be
> using https://doc.rust-lang.org/std/path/struct.Path.html,
> https://doc.rust-lang.org/std/ffi/struct.OsString.html and their owned
> variants to represent paths and filenames.

Using Path/PathBuf is probably good for paths like repo.root, but we have
other kind of paths, to be stored in changelog/manifest/dirstate. These data
must be byte-transparent, may contain invalid character on some platforms,
and have different rules (e.g. \ vs / on Windows.) So I don't think they
can be backed by Path/PathBuf. Maybe we'll need our own HgPath type and
allow conversion to Path/PathBuf/[u8]/str, etc.

Regarding Windows filenames, Mercurial uses ANSI (or MBCS) encoding, whereas
Rust OsStr is basically a Unicode (WTF-8.) If WindowsUTF8Plain is implemented,
things will get more complicated. Filenames may be either ANSI or UTF-8,
so we'll have to select the codec per repository.

https://www.mercurial-scm.org/wiki/WindowsUTF8Plan

> I am not 100% sure of what the compatibility layer should do. From what 
> I can see, rust-cpython enforces UTF8 when the string is non-ascii 
> (https://docs.rs/cpython/0.3.0/src/cpython/objects/string.rs.html#242), 
> which we cannot accept as filenames can be something other than UTF8. Is 
> using https://doc.rust-lang.org/std/ffi/struct.CStr.html the right 
> solution, or am I going in the wrong direction?

CStr seems not appropriate. It's designed for C interop.


More information about the Mercurial-devel mailing list