[PATCH V2] fsmonitor: match watchman and filesystem encoding

Olivier Trempe oliviertrempe at gmail.com
Thu Apr 6 12:46:38 EDT 2017


On Thu, Apr 6, 2017 at 9:45 AM, Yuya Nishihara <yuya at tcha.org> wrote:

> On Wed, 5 Apr 2017 10:55:17 -0700, Siddharth Agarwal wrote:
> > On 4/5/17 08:42, Olivier Trempe wrote:
> > > # HG changeset patch
> > > # User Olivier Trempe <oliviertrempe at gmail.com>
> > > # Date 1488981822 18000
> > > #      Wed Mar 08 09:03:42 2017 -0500
> > > # Branch stable
> > > # Node ID 2021c3032968bef6b8d1cd7bea5a22996ced994c
> > > # Parent  68f263f52d2e3e2798b4f1e55cb665c6b043f93b
> > > fsmonitor: match watchman and filesystem encoding
> > >
> > > watchman's paths encoding can differ from filesystem encoding. For
> example,
> > > on Windows, it's always utf-8.
> > >
> > > Before this patch, on Windows, mismatch in path comparison between
> fsmonitor
> > > state and osutil.statfiles would yield a clean status for
> added/modified files.
> > >
> > > In addition to status reporting wrong results, this leads to files
> being
> > > discarded from changesets while doing history editing operations such
> as rebase.
> >
> > This patch looks correct to me, though I have questions about its
> > performance below.
> >
> > +cc foozy for another look.
>
> [...]
>
> > > +_watchmanencoding = pywatchman.encoding.get_local_encoding()
> > > +_fsencoding = sys.getfilesystemencoding() or sys.getdefaultencoding()
> > > +_fixencoding = codecs.lookup(_watchmanencoding) !=
> codecs.lookup(_fsencoding)
> > > +
> > > +def _watchmantofsencoding(path):
> > > +    """Fix path to match watchman and local filesystem encoding
> > > +
> > > +    watchman's paths encoding can differ from filesystem encoding.
> For example,
> > > +    on Windows, it's always utf-8.
> > > +    """
> > > +    try:
> > > +        decoded = path.decode(_watchmanencoding)
> > > +    except UnicodeDecodeError as e:
> > > +        raise error.Abort(e, hint='watchman encoding error')
> >
> > Does this need to be str(e)?
>
> Perhaps.
> >
> > > +
> > > +    return decoded.encode(_fsencoding, 'replace')
>
> Maybe it's better to catch exception here. Encoding error would be more
> likely
> to happen because Windows ANSI charset is generally narrower than UTF-*.
>

You mean setting the error handler to 'strict' rather than 'replace' and
wrap the call in a try except block?
Or just wrap the call in a try except block, but keep the 'replace' error
handler?
Using the 'replace' error handler is necessary here to match the behavior
of osutil.listdir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170406/ee922f60/attachment.html>


More information about the Mercurial-devel mailing list