[PATCH V2] fsmonitor: match watchman and filesystem encoding
Olivier Trempe
oliviertrempe at gmail.com
Wed Apr 5 16:30:11 EDT 2017
On Wed, Apr 5, 2017 at 1:55 PM, Siddharth Agarwal <sid at less-broken.com>
wrote:
>
> On 4/5/17 08:42, Olivier Trempe wrote:
>>
>> # HG changeset patch
>> # User Olivier Trempe <oliviertrempe at gmail.com>
>> # Date 1488981822 18000
>> # Wed Mar 08 09:03:42 2017 -0500
>> # Branch stable
>> # Node ID 2021c3032968bef6b8d1cd7bea5a22996ced994c
>> # Parent 68f263f52d2e3e2798b4f1e55cb665c6b043f93b
>> fsmonitor: match watchman and filesystem encoding
>>
>> watchman's paths encoding can differ from filesystem encoding. For
example,
>> on Windows, it's always utf-8.
>>
>> Before this patch, on Windows, mismatch in path comparison between
fsmonitor
>> state and osutil.statfiles would yield a clean status for added/modified
files.
>>
>> In addition to status reporting wrong results, this leads to files being
>> discarded from changesets while doing history editing operations such as
rebase.
>
>
> This patch looks correct to me, though I have questions about its
performance below.
>
> +cc foozy for another look.
>
>
>>
>> diff --git a/hgext/fsmonitor/__init__.py b/hgext/fsmonitor/__init__.py
>> --- a/hgext/fsmonitor/__init__.py
>> +++ b/hgext/fsmonitor/__init__.py
>> @@ -91,6 +91,7 @@
>> from __future__ import absolute_import
>> +import codecs
>> import hashlib
>> import os
>> import stat
>> @@ -99,6 +100,7 @@
>> from mercurial import (
>> context,
>> encoding,
>> + error,
>> extensions,
>> localrepo,
>> merge,
>> @@ -110,6 +112,7 @@
>> from mercurial import match as matchmod
>> from . import (
>> + pywatchman,
>> state,
>> watchmanclient,
>> )
>> @@ -159,6 +162,23 @@
>> sha1.update('\0')
>> return sha1.hexdigest()
>> +_watchmanencoding = pywatchman.encoding.get_local_encoding()
>> +_fsencoding = sys.getfilesystemencoding() or sys.getdefaultencoding()
>> +_fixencoding = codecs.lookup(_watchmanencoding) !=
codecs.lookup(_fsencoding)
>> +
>> +def _watchmantofsencoding(path):
>> + """Fix path to match watchman and local filesystem encoding
>> +
>> + watchman's paths encoding can differ from filesystem encoding. For
example,
>> + on Windows, it's always utf-8.
>> + """
>> + try:
>> + decoded = path.decode(_watchmanencoding)
>> + except UnicodeDecodeError as e:
>> + raise error.Abort(e, hint='watchman encoding error')
>
>
> Does this need to be str(e)?
>
Absolutely. This will be fixed in V3.
I also realize the "import sys" statement was removed 3 months ago. I
missed it when rebasing... This will also be fixed in V3.
>>
>> +
>> + return decoded.encode(_fsencoding, 'replace')
>> +
>> def overridewalk(orig, self, match, subrepos, unknown, ignored,
full=True):
>> '''Replacement for dirstate.walk, hooking into Watchman.
>> @@ -303,6 +323,8 @@
>> # for name case changes.
>> for entry in result['files']:
>> fname = entry['name']
>> + if _fixencoding:
>> + fname = _watchmantofsencoding(fname)
>
>
> This is a critical path IIRC, so I'm a little concerned about performance
here -- both on Mac/Linux where _fixencoding will (always?) be false and on
Windows where it will (always?) be true.
>
> Any chance you you could measure the before and after with (say) 10,000
files in the result set? Thanks!
>
There is a little overhead at module import:
python -m timeit "import hgext.fsmonitor"
Windows before patch: 1000000 loops, best of 3: 0.563 usec per loop
Windows after patch: 1000000 loops, best of 3: 0.583 usec per loop
Linx before patch: 1000000 loops, best of 3: 0.628 usec per loop
Linux after patch: 1000000 loops, best of 3: 0.579 usec per loop
10000 calls to _watchmantofsencoding:
python -m timeit -s "from hgext.fsmonitor import _watchmantofsencoding,
_fixencoding" "fname = '/path/to/file'" "for i in range(10000):" " if
_fixencoding: fname = _watchmantofsencoding(fname)"
Windows (_fixencoding is True): 100 loops, best of 3: 19.5 msec per loop
Linux (_fixencoding is False): 100 loops, best of 3: 3.08 msec per loop
Do you want me to include these results in the commit message?
>
>> if switch_slashes:
>> fname = fname.replace('\\', '/')
>> if normalize:
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170405/8e10b15f/attachment.html>
More information about the Mercurial-devel
mailing list