[PATCH in crew-stable] subrepo: encode unicode path names (issue3610)

Martin Geisler martin at geisler.net
Mon Sep 10 16:38:25 CDT 2012


Bryan O'Sullivan <bos at serpentine.com> writes:

> # HG changeset patch
> # User Bryan O'Sullivan <bryano at fb.com>
> # Date 1346798764 25200
> # Branch stable
> # Node ID cb12d3ce56072c55ad011e806d781873dc2cfe61
> # Parent  0cec762790ed34c469ce67b8ca8223545c57e148
> subrepo: encode unicode path names (issue3610)
>
> Subversion 1.7 changes its XML output to include an explicit encoding tag:
>
>   <?xml version="1.0" encoding="UTF-8"?>
>
> This triggers xml.dom.minidom to always return unicode strings, causing
> other parts of the code to explode.
>
> We unconditionally encode path names before handing them back, which
> works with both str (actually a no-op) and unicode values.
>
> diff --git a/mercurial/subrepo.py b/mercurial/subrepo.py
> --- a/mercurial/subrepo.py
> +++ b/mercurial/subrepo.py
> @@ -838,7 +838,7 @@
>              name = ''.join(c.data for c
>                             in e.getElementsByTagName('name')[0].childNodes
>                             if c.nodeType == c.TEXT_NODE)
> -            paths.append(name)
> +            paths.append(name.encode('utf-8'))

Is UTF-8 always the right encoding here?

From looking briefly at this, I see that self.files is called in
abstractsubrepo.archive. There the file names are filtered using a match
object and later passed to self.filedata. That in turn calls

  svn cat PATH

I think both the match object and Subversion would expect to see a path
encoded in the current locale (instead of a fixed UTF-8 locale).

-- 
Martin Geisler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120910/d1337096/attachment.pgp>


More information about the Mercurial-devel mailing list