[PATCH V2] fsmonitor: match watchman and filesystem encoding

Siddharth Agarwal sid at less-broken.com
Wed Apr 5 13:55:17 EDT 2017


On 4/5/17 08:42, Olivier Trempe wrote:
> # HG changeset patch
> # User Olivier Trempe <oliviertrempe at gmail.com>
> # Date 1488981822 18000
> #      Wed Mar 08 09:03:42 2017 -0500
> # Branch stable
> # Node ID 2021c3032968bef6b8d1cd7bea5a22996ced994c
> # Parent  68f263f52d2e3e2798b4f1e55cb665c6b043f93b
> fsmonitor: match watchman and filesystem encoding
>
> watchman's paths encoding can differ from filesystem encoding. For example,
> on Windows, it's always utf-8.
>
> Before this patch, on Windows, mismatch in path comparison between fsmonitor
> state and osutil.statfiles would yield a clean status for added/modified files.
>
> In addition to status reporting wrong results, this leads to files being
> discarded from changesets while doing history editing operations such as rebase.

This patch looks correct to me, though I have questions about its 
performance below.

+cc foozy for another look.

>
> diff --git a/hgext/fsmonitor/__init__.py b/hgext/fsmonitor/__init__.py
> --- a/hgext/fsmonitor/__init__.py
> +++ b/hgext/fsmonitor/__init__.py
> @@ -91,6 +91,7 @@
>   
>   from __future__ import absolute_import
>   
> +import codecs
>   import hashlib
>   import os
>   import stat
> @@ -99,6 +100,7 @@
>   from mercurial import (
>       context,
>       encoding,
> +    error,
>       extensions,
>       localrepo,
>       merge,
> @@ -110,6 +112,7 @@
>   from mercurial import match as matchmod
>   
>   from . import (
> +    pywatchman,
>       state,
>       watchmanclient,
>   )
> @@ -159,6 +162,23 @@
>       sha1.update('\0')
>       return sha1.hexdigest()
>   
> +_watchmanencoding = pywatchman.encoding.get_local_encoding()
> +_fsencoding = sys.getfilesystemencoding() or sys.getdefaultencoding()
> +_fixencoding = codecs.lookup(_watchmanencoding) != codecs.lookup(_fsencoding)
> +
> +def _watchmantofsencoding(path):
> +    """Fix path to match watchman and local filesystem encoding
> +
> +    watchman's paths encoding can differ from filesystem encoding. For example,
> +    on Windows, it's always utf-8.
> +    """
> +    try:
> +        decoded = path.decode(_watchmanencoding)
> +    except UnicodeDecodeError as e:
> +        raise error.Abort(e, hint='watchman encoding error')

Does this need to be str(e)?

> +
> +    return decoded.encode(_fsencoding, 'replace')
> +
>   def overridewalk(orig, self, match, subrepos, unknown, ignored, full=True):
>       '''Replacement for dirstate.walk, hooking into Watchman.
>   
> @@ -303,6 +323,8 @@
>       # for name case changes.
>       for entry in result['files']:
>           fname = entry['name']
> +        if _fixencoding:
> +            fname = _watchmantofsencoding(fname)

This is a critical path IIRC, so I'm a little concerned about 
performance here -- both on Mac/Linux where _fixencoding will (always?) 
be false and on Windows where it will (always?) be true.

Any chance you you could measure the before and after with (say) 10,000 
files in the result set? Thanks!

>           if switch_slashes:
>               fname = fname.replace('\\', '/')
>           if normalize:




More information about the Mercurial-devel mailing list