Rust: the issue of filenames and Python interop

Yuya Nishihara yuya at tcha.org
Tue Aug 27 19:30:54 EDT 2019


On Tue, 27 Aug 2019 18:02:28 +0200, Georges Racinet wrote:
> On 8/27/19 4:54 PM, Raphaël Gomès wrote:
> > On 8/27/19 3:37 PM, Yuya Nishihara wrote:
> >> Regarding Windows filenames, Mercurial uses ANSI (or MBCS) encoding,
> >> whereas
> >> Rust OsStr is basically a Unicode (WTF-8.) If WindowsUTF8Plain is
> >> implemented,
> >> things will get more complicated. Filenames may be either ANSI or UTF-8,
> >> so we'll have to select the codec per repository.
> >>
> >> https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
> >
> > Am I correct in saying our main issue is that paths can be created and
> > used by different platforms? Should we split "local-only" paths to use
> > Rust's OsString and "shared" paths to use HgPath?
> This sounds indeed clean to me, and I guess we'll come to really
> appreciate the stability of TryInto.

Sounds generally good to me.

We have repository-relative path (or canonical path) used in repository
API, which is basically:

 - posix-like even on Windows,
 - no leading slash,
 - no "." nor ".." of special meaning,
 - stored in repository and shared across platforms.

Let's call it an HgPath.

When we wanna access to filelog for example, we'll convert an HgPath to
a platform-specific path, where Rust's Path or OsStr will come into play.
On Unix, it's just byte-to-byte conversion. On Windows, it has to be
decoded from MBCS to WTF-8. If WindowsUTF8Plan is implemented, the source
character encoding will be determined per repository basis.


More information about the Mercurial-devel mailing list