Request vetting a fix for my win32lfn extension

Nikolaj Sjujskij sterkrig at myopera.com
Thu Nov 8 00:33:02 CST 2012


Den 2012-11-08 03:11:35 skrev Aaron Cohen <aaron at assonance.org>:

> Since around Mercurial 2.3, my extension has been broken, by the change  
> to
> move pack_dirstate to C.
>
> I don't do a very good job of updating and testing new versions, so
> recently a bug was filed against it which was the first time I know if  
> it.
> I've now "fixed" it, but I would appreciate some help determining if my  
> fix
> is correct.
>
> Basically, win32lfn replaces python's normal os.listdir with a call to
> FindFilesIterator. This is to allow the purpose of my extension to work  
> (I
> transparently convert long directory names to something like
> \\?\long_directory_name).
>
> I was a little careless when I implemented this, and did not notice that
> FindFilesIterator returns a unicode string rather than a normal string  
> for
> file names, which meant that I was putting unicode into the dirstate map  
> as
> the key for the file. This was apparently a harmless oversight.
>
> However, the new C code explicitly checks that the key is a python  
> string,
> and nothing else, which broke my extension. I have no problem with this
> breakage, I'd just appreciate it if someone who knows a little more about
> encoding than me, could verify that my simple fix is reasonable.
>
> You can see the fix here on bitbucket:
> https://bitbucket.org/remleduff/win32lfn/changeset/3d9fa2890fa0712e3f286710d379c5d6
>
>  This fix is simply to convert the unicode string that I receive from the
> Windows API function to a string by calling "filename.encode()".
>
> Is there anything that immediately jumps out as incorrect here?
  It would fail for every filename with non-ASCII symbols. Probably it  
should be `filename.encode('utf8')`, since it's internal string format in  
Mercurial (if I'm not mistaken). My first idea was  
`filename.encode(encoding._getpreferredencoding())` or something from  
`mercurial.encoding` module, but it's hardly correct, now when I think  
about it.

See http://mercurial.selenic.com/wiki/EncodingStrategy


More information about the Mercurial-devel mailing list