RFC: safe pattern matching for problematic encoding

Thu May 24 06:23:04 CDT 2012

At Wed, 23 May 2012 22:50:53 +0200,
Martin Geisler wrote:

> Matt Mackall <mpm at selenic.com> writes:
> 
> > On Wed, 2012-05-23 at 15:14 +0200, Antoine Pitrou wrote:
> >> On Wed, 23 May 2012 14:53:43 +0200
> >> Mads <mads at kiilerich.com> wrote:
> >> > 
> >> > > As you noticed, wrapping/hooking points are scattered in widely, so I
> >> > > think that this implementation is not so good. But I don't have any
> >> > > other ideas.
> >> > >
> >> > > Are there any other ideas to solve this problem ?
> >> > 
> >> > The only viable solution is to consistently use utf-8 inside Mercurial.
> >> 
> >> Or to consistently use unicode strings ;)
> >
> > <rage class=python3>
> > Yes, please go waste the next year or two of your life working on that
> > brilliant idea. Don't come back until you can preserve mixed filename
> > encodings on Linux while interoperating with old hg clients. Best of
> > luck.
> > </rage>
> 
> Is mixed filename encodings really something we want to support?
> 
> It sounds like a super rare situation to me, and a situation that the
> users would be happy to correct if it is detected.
> 
> Some users would probably say "why did you even allow me to make this
> mess in the first place?!" and consider it a bug that such repositories
> can exist today.

I understand that Matt worries about time lag between:

  - unicode specification changing
  - catch-up for unicode (or wchar?) file API of each platforms
  - catch-up for encoding implementation of Python for unicode decoding

and gaps of supported level between each components on each environments.

# is this not enough understanding ?

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp