[PATCH stable] fsmonitor: match watchman and local encoding

Olivier Trempe oliviertrempe at gmail.com
Tue Mar 7 08:41:33 EST 2017


On Mon, Mar 6, 2017 at 8:14 PM, Siddharth Agarwal <sid at less-broken.com>
wrote:

> On 3/6/17 09:50, Olivier Trempe wrote:
>
>> # HG changeset patch
>> # User Olivier Trempe <oliviertrempe at gmail.com>
>> # Date 1488810111 18000
>> #      Mon Mar 06 09:21:51 2017 -0500
>> # Branch stable
>> # Node ID c9d3f8d1a57346228f5c3bb749acdff90d37e194
>> # Parent  6b00c3ecd15b26587de8cca6fab811069cba3b2f
>> fsmonitor: match watchman and local encoding
>>
>> watchman's paths encoding is os dependant. For example, on Windows, it's
>> always utf-8. This causes paths comparison mismatch when paths contain
>> non ascii
>> characters.
>>
>
> I really doubt this is correct unixes, where Watchman returns bytes as
> they are on disk, which matches exactly with what Mercurial wants.
>
> (On Windows Watchman indeed always returns UTF-8.)
>
> This is what I meant by "os dependent". You get a different behavior on a
different os. I can rewrite it to be more Windows specific.

>
>> diff -r 6b00c3ecd15b -r c9d3f8d1a573 hgext/fsmonitor/__init__.py
>> --- a/hgext/fsmonitor/__init__.py       Thu Mar 02 20:19:45 2017 -0500
>> +++ b/hgext/fsmonitor/__init__.py       Mon Mar 06 09:21:51 2017 -0500
>> @@ -99,6 +99,7 @@
>>   from mercurial import (
>>       context,
>>       encoding,
>> +    error,
>>       extensions,
>>       localrepo,
>>       merge,
>> @@ -110,6 +111,7 @@
>>   from mercurial import match as matchmod
>>     from . import (
>> +    pywatchman,
>>       state,
>>       watchmanclient,
>>   )
>> @@ -159,6 +161,20 @@
>>       sha1.update('\0')
>>       return sha1.hexdigest()
>>   +def _watchmanencodingtolocal(path):
>> +    """Fix path to match watchman and local encoding
>> +
>> +    watchman's paths encoding is os dependant. For example, on Windows,
>> it's
>>
>
> "dependent"
>
> +    always utf-8. This converts watchman encoded paths to local encoding
>> to
>> +    avoid paths comparison mismatch.
>> +    """
>> +    try:
>> +        decoded = pywatchman.encoding.decode_local(path)
>> +    except UnicodeDecodeError as e:
>> +        raise error.Abort(e, hint='watchman encoding error')
>>
>
> Could you elaborate a bit on when the exception can happen?
>

This is a defensive implementation. I did not encounter this exception
myself. However, even though watchman and pywatchman are maintained
together, there is no guarantee that the user will run the watchman build
matching the pywatchman version distributed with mercurial. Given that
getting the watchman encoding relies on pywatchman's encoding module, you
could get an error if something changes on either side. I just didn't want
to let a possible unhandled exception propagate up to the user.

>
> +
>> +    return decoded.encode(encoding.encoding, 'replace')
>> +
>>   def overridewalk(orig, self, match, subrepos, unknown, ignored,
>> full=True):
>>       '''Replacement for dirstate.walk, hooking into Watchman.
>>   @@ -302,7 +318,7 @@
>>       # Watchman tracks files.  We use this property to reconcile deletes
>>       # for name case changes.
>>       for entry in result['files']:
>> -        fname = entry['name']
>> +        fname = _watchmanencodingtolocal(entry['name'])
>>
>
> Adding a non-trivial function call here is likely going to regress
> performance. Have you measured the impact?


100000 calls to the function result in a ~0.3 seconds impact.
I can optimize it by triggering the conversion only if the encoding
returned by pywatchman is different than the local encoding. If they are
the same, the impact is close to null. If not, the impact is ~0.24 seconds.

>
>
- Siddharth
>
>           if switch_slashes:
>>               fname = fname.replace('\\', '/')
>>           if normalize:
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>>
>
>
>
This is a very nasty bug on windows. In common cases, you don't get
modified files when calling status. No so bad. However, I lost
modifications while doing history editing operations. Files containing
non-ascii characters in their path were silently discarded from a rebased
changeset.

Thanks for your comments. I will submit a new patch when I get feedback
from you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170307/2bf2a848/attachment.html>


More information about the Mercurial-devel mailing list