Request vetting a fix for my win32lfn extension
Nikolaj Sjujskij
sterkrig at myopera.com
Thu Nov 8 00:33:02 CST 2012
Den 2012-11-08 03:11:35 skrev Aaron Cohen <aaron at assonance.org>:
> Since around Mercurial 2.3, my extension has been broken, by the change
> to
> move pack_dirstate to C.
>
> I don't do a very good job of updating and testing new versions, so
> recently a bug was filed against it which was the first time I know if
> it.
> I've now "fixed" it, but I would appreciate some help determining if my
> fix
> is correct.
>
> Basically, win32lfn replaces python's normal os.listdir with a call to
> FindFilesIterator. This is to allow the purpose of my extension to work
> (I
> transparently convert long directory names to something like
> \\?\long_directory_name).
>
> I was a little careless when I implemented this, and did not notice that
> FindFilesIterator returns a unicode string rather than a normal string
> for
> file names, which meant that I was putting unicode into the dirstate map
> as
> the key for the file. This was apparently a harmless oversight.
>
> However, the new C code explicitly checks that the key is a python
> string,
> and nothing else, which broke my extension. I have no problem with this
> breakage, I'd just appreciate it if someone who knows a little more about
> encoding than me, could verify that my simple fix is reasonable.
>
> You can see the fix here on bitbucket:
> https://bitbucket.org/remleduff/win32lfn/changeset/3d9fa2890fa0712e3f286710d379c5d6
>
> This fix is simply to convert the unicode string that I receive from the
> Windows API function to a string by calling "filename.encode()".
>
> Is there anything that immediately jumps out as incorrect here?
It would fail for every filename with non-ASCII symbols. Probably it
should be `filename.encode('utf8')`, since it's internal string format in
Mercurial (if I'm not mistaken). My first idea was
`filename.encode(encoding._getpreferredencoding())` or something from
`mercurial.encoding` module, but it's hardly correct, now when I think
about it.
See http://mercurial.selenic.com/wiki/EncodingStrategy
More information about the Mercurial-devel
mailing list