Rust: the issue of filenames and Python interop
Yuya Nishihara
yuya at tcha.org
Tue Aug 27 19:30:54 EDT 2019
On Tue, 27 Aug 2019 18:02:28 +0200, Georges Racinet wrote:
> On 8/27/19 4:54 PM, Raphaël Gomès wrote:
> > On 8/27/19 3:37 PM, Yuya Nishihara wrote:
> >> Regarding Windows filenames, Mercurial uses ANSI (or MBCS) encoding,
> >> whereas
> >> Rust OsStr is basically a Unicode (WTF-8.) If WindowsUTF8Plain is
> >> implemented,
> >> things will get more complicated. Filenames may be either ANSI or UTF-8,
> >> so we'll have to select the codec per repository.
> >>
> >> https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
> >
> > Am I correct in saying our main issue is that paths can be created and
> > used by different platforms? Should we split "local-only" paths to use
> > Rust's OsString and "shared" paths to use HgPath?
> This sounds indeed clean to me, and I guess we'll come to really
> appreciate the stability of TryInto.
Sounds generally good to me.
We have repository-relative path (or canonical path) used in repository
API, which is basically:
- posix-like even on Windows,
- no leading slash,
- no "." nor ".." of special meaning,
- stored in repository and shared across platforms.
Let's call it an HgPath.
When we wanna access to filelog for example, we'll convert an HgPath to
a platform-specific path, where Rust's Path or OsStr will come into play.
On Unix, it's just byte-to-byte conversion. On Windows, it has to be
decoded from MBCS to WTF-8. If WindowsUTF8Plan is implemented, the source
character encoding will be determined per repository basis.
More information about the Mercurial-devel
mailing list