[PATCH 1 of 8 RFC] vfs: replace invocation of file APIs of os module by ones via vfs

Adrian Buehlmann adrian at cadifra.com
Sat Jun 16 16:34:47 CDT 2012


On 2012-06-16 19:19, Adrian Buehlmann wrote:
> But my dumb uninformed impression is this whole path juggling needs to
> be done with those unicode strings. So, for exmaple scmutil.canonpath
> would have to operate on unicode strings. And perhaps a higher layer
> would then convert the relative path to UTF-8 so the higher levels of
> mercurial can be shielded from those unicode paths. But I don't see how
> anything else can work but using the wide APIs for file system accesses,
> which includes the store, as the root may be a "unicode" path already.

Some further, perhaps stupid and wild ideas:

For the openers (e.g. scmutil.opener) I think we might have to put a unicode
string into base (see scmutil.py):

199:    def __init__(self, base, audit=True):
200:        self.base = base

For the store openers, the path parameter on __call__

218:    def __call__(self, path, mode="r", text=False, atomictemp=False):

would then be plain ASCII strings, as the filenames in the store are
all encoded already, using ASCII characters only.

Then the join function

293:    def join(self, path):
293:        return os.path.join(self.base, path)

needs to return a unicode string, which is formed by using the "base"
unicode string and joining it with the ASCII path.

join() is used in __call__() to form the final, complete path f

224:        f = self.join(path)

which needs to be a unicode string as well (on Windows, of course).

We then need a unicode version of util.posixfile

261:        fp = util.posixfile(f, mode)

Which takes the unicode filename f.

So we would then also need a unicode version of posixfile for Windows in
osutil.c, line 410.

The store openers need to be unicode-aware because of the base.

base is somewhere under the repo root. Which in turn can have funny characters
(e.g. Japanese).

I think this has to be done unconditionally, if we want to support repo
roots with funny paths.

Likewise, the base of wopeners need to be unicode strings as well for
the same reasons.

But there, we ideally most likely want to have the path parameter on
__call__ in UTF-8, or some other encoding (e.g. latin1 or whatever?),
depending on some other conditions (the switching as per Matt's ideas).






More information about the Mercurial-devel mailing list