[PATCH 2 of 2] match: add a subclass for dirstate normalizing of the matched patterns

Matt Harbison mharbison72 at gmail.com
Mon Apr 13 18:50:39 CDT 2015


On Mon, 13 Apr 2015 13:36:53 -0400, Siddharth Agarwal  
<sid at less-broken.com> wrote:

> On 04/12/2015 04:52 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison at yahoo.com>
>> # Date 1428817161 14400
>> #      Sun Apr 12 01:39:21 2015 -0400
>> # Node ID 6172eed8aa036002775a2ed02df47be5df02acc7
>> # Parent  75835458befcf5ddcef740c1a2ef0d5ce6804928
>> match: add a subclass for dirstate normalizing of the matched patterns
>>
>> This class is only needed on case insensitive filesystems, and only for  
>> wdir
>> context matches.  It allows the user to not match the case of the items  
>> in the
>> filesystem- especially for naming directories, which dirstate doesn't  
>> handle[1].
>> Making dirstate handle mismatched directory cases is too expensive[2].
>>
>> Since dirstate doesn't apply to committed csets, this is only created by
>> overriding basectx.match() in workingctx, and only on icasefs.  The  
>> default
>> arguments have been dropped, because the ctx must be passed to the  
>> matcher in
>> order to function.
>>
>> For operations that can apply to both wdir and some other context, this  
>> ends up
>> normalizing the filename to the case as it exists in the filesystem,  
>> and using
>> that case for the lookup in the other context.  See the diff example in  
>> the
>> test.
>>
>> Previously, given a directory with an inexact case:
>>
>>   - add worked as expected
>>
>>   - diff, forget and status would silently ignore the request
>>
>>   - files would exit with 1
>>
>>   - commit, revert and remove would fail (even when the commands  
>> leading up to
>>     them worked):
>>
>>         $ hg ci -m "AbCDef" capsdir1/capsdir
>>         abort: CapsDir1/CapsDir: no match under directory!
>>
>>         $ hg revert -r '.^' capsdir1/capsdir
>>         capsdir1\capsdir: no such file in rev 64dae27060b7
>>
>>         $ hg remove capsdir1/capsdir
>>         not removing capsdir1\capsdir: no tracked files
>>         [1]
>>
>> Globs are normalized, so that the -I and -X don't need to be specified  
>> with a
>> case match.  Without that, the second last remove (with -X) removes the  
>> files,
>> leaving nothing for the last remove.  However, specifying the files as
>> 'glob:**.Txt' does not work.  Perhaps this requires 're.IGNORECASE'?
>>
>> There are only a handful of places that create matchers directly,  
>> instead of
>> being routed through the context.match() method.  Some may benefit from  
>> changing
>> over to using ctx.match() as a factory function:
>>
>>   revset.checkstatus()
>>   revset.contains()
>>   revset.filelog()
>>   revset._matchfiles()
>>   localrepository._loadfilter()
>>   ignore.ignore()
>>   fileset.subrepo()
>>   filemerge._picktool()
>>   overrides.addlargefiles()
>>   lfcommands.lfconvert()
>>   kwtemplate.__init__()
>>   eolfile.__init__()
>>   eolfile.checkrev()
>>   acl.buildmatch()
>>
>> Currently, a toplevel subrepo can be named with an inexact case.   
>> However, the
>> path auditor gets in the way of naming _anything_ in the subrepo if the  
>> top
>> level case doesn't match.
>
> So this is a TODO then?

Yes.  It might be tricky though, because localrepository._checknested()  
checks 'prefix in ctx.substate', which might mean normalizing the keys in  
workingctx.substate.

>>
>>   --- a/tests/test-subrepo-deep-nested-change.t
>>   +++ b/tests/test-subrepo-deep-nested-change.t
>>   @@ -170,8 +170,15 @@
>>      R sub1/sub2/test.txt
>>      $ hg update -Cq
>>      $ touch sub1/sub2/folder/bar
>>   +#if icasefs
>>   +  $ hg addremove Sub1/sub2
>>   +  abort: path 'Sub1\sub2' is inside nested repo 'Sub1'
>>   +  [255]
>>   +  $ hg -q addremove sub1/sub2
>>   +#else
>>      $ hg addremove sub1/sub2
>>      adding sub1/sub2/folder/bar (glob)
>>   +#endif
>>      $ hg status -S
>>      A sub1/sub2/folder/bar
>>      ? foo/bar/abc
>>
>> The narrowmatcher class may need to be tweaked when that is fixed.
>>
>>
>> [1]  
>> http://www.selenic.com/pipermail/mercurial-devel/2015-April/068183.html
>> [2]  
>> http://www.selenic.com/pipermail/mercurial-devel/2015-April/068191.html
>>
>> diff --git a/mercurial/context.py b/mercurial/context.py
>> --- a/mercurial/context.py
>> +++ b/mercurial/context.py
>> @@ -1424,6 +1424,19 @@
>>              finally:
>>                  wlock.release()
>>
>> +    def match(self, pats=[], include=None, exclude=None,  
>> default='glob'):
>> +        r = self._repo
>> +
>> +        # Only a case insensitive filesystem needs magic to translate  
>> user input
>> +        # to actual case in the filesystem.
>> +        if not util.checkcase(r.root):
>> +            return matchmod.icasefsmatcher(r.root, r.getcwd(), pats,  
>> include,
>> +                                           exclude, default, False,  
>> r.auditor,
>> +                                           self)
>> +        return matchmod.match(r.root, r.getcwd(), pats,
>> +                              include, exclude, default,
>> +                              auditor=r.auditor, ctx=self)
>> +
>>      def _filtersuspectsymlink(self, files):
>>          if not files or self._repo.dirstate._checklink:
>>              return files
>> diff --git a/mercurial/match.py b/mercurial/match.py
>> --- a/mercurial/match.py
>> +++ b/mercurial/match.py
>> @@ -273,6 +273,34 @@
>>      def rel(self, f):
>>          return self._matcher.rel(self._path + "/" + f)
>>
>> +class icasefsmatcher(match):
>> +    """A matcher for wdir on case insenstive filesystems, which  
>> normalizes the
>> +    given patterns to the case in the filesystem.
>> +    """
>> +
>> +    def __init__(self, root, cwd, patterns, include, exclude, default,  
>> exact,
>> +                 auditor, ctx):
>> +        init = super(icasefsmatcher, self).__init__
>> +        self._dsnormalize = ctx.repo().dirstate.normalize
>> +
>> +        init(root, cwd, patterns, include, exclude, default, exact,  
>> auditor,
>> +             ctx)
>> +
>> +        # Exact matches must be based off of the actual user input,  
>> otherwise
>> +        # inexact case matches are treated as exact, and not noted  
>> without -v.
>> +        if not exact and self._files:
>> +            self._fmap = set(_roots(self._kp))
>> +
>> +    def _normalize(self, patterns, default, root, cwd, auditor):
>
> We shouldn't apply case normalization on exact matchers at all, I think.

Agreed.  The superclass doesn't call _normalize() if exact.

What this is trying to say is that _fmap needs to be updated with roots in  
the user specified case, so that m.exact('name') is testing against what  
the user provided.  Otherwise, existing tests drop 'adding xyz' lines when  
only the case is different.  matchmod.exact([a, b, c]) bypasses this new  
class always.

I wasn't sure if both constructors should take the same parameters, in the  
same order, for sanity.  If not, I can drop the 'exact' variable.  This is  
only created with exact == False anyway.

> Other than that this looks fine. dirstate.normalize is a little more
> expensive than necessary but the number of patterns is usually very  
> small.
>
> - Siddharth
>
>> +        self._kp = super(icasefsmatcher, self)._normalize(patterns,  
>> default,
>> +                                                          root, cwd,  
>> auditor)
>> +        kindpats = []
>> +        for kind, pats in self._kp:
>> +            if kind not in ('re', 'relre'):  # regex can't be  
>> normalized
>> +                pats = self._dsnormalize(pats)
>> +            kindpats.append((kind, pats))
>> +        return kindpats
>> +
>>  def patkind(pattern, default=None):
>>      '''If pattern is 'kind:pat' with a known kind, return kind.'''
>>      return _patsplit(pattern, default)[0]
>> diff --git a/tests/test-add.t b/tests/test-add.t
>> --- a/tests/test-add.t
>> +++ b/tests/test-add.t
>> @@ -176,12 +176,48 @@
>>    $ mkdir CapsDir1/CapsDir/SubDir
>>    $ echo def > CapsDir1/CapsDir/SubDir/Def.txt
>>
>> -  $ hg add -v capsdir1/capsdir
>> +  $ hg add capsdir1/capsdir
>>    adding CapsDir1/CapsDir/AbC.txt (glob)
>>    adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>
>>    $ hg forget capsdir1/capsdir/abc.txt
>>    removing CapsDir1/CapsDir/AbC.txt (glob)
>> +
>> +  $ hg forget capsdir1/capsdir
>> +  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
>> +
>> +  $ hg add capsdir1
>> +  adding CapsDir1/CapsDir/AbC.txt (glob)
>> +  adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
>> +
>> +  $ hg ci -m "AbCDef" capsdir1/capsdir
>> +
>> +  $ hg status -A capsdir1/capsdir
>> +  C CapsDir1/CapsDir/AbC.txt
>> +  C CapsDir1/CapsDir/SubDir/Def.txt
>> +
>> +  $ hg files capsdir1/capsdir
>> +  CapsDir1/CapsDir/AbC.txt (glob)
>> +  CapsDir1/CapsDir/SubDir/Def.txt (glob)
>> +
>> +  $ echo xyz > CapsDir1/CapsDir/SubDir/Def.txt
>> +  $ hg ci -m xyz capsdir1/capsdir/subdir/def.txt
>> +
>> +  $ hg revert -r '.^' capsdir1/capsdir
>> +  reverting CapsDir1/CapsDir/SubDir/Def.txt (glob)
>> +
>> +  $ hg diff capsdir1/capsdir
>> +  diff -r 5112e00e781d CapsDir1/CapsDir/SubDir/Def.txt
>> +  --- a/CapsDir1/CapsDir/SubDir/Def.txt	Thu Jan 01 00:00:00 1970 +0000
>> +  +++ b/CapsDir1/CapsDir/SubDir/Def.txt	* +0000 (glob)
>> +  @@ -1,1 +1,1 @@
>> +  -xyz
>> +  +def
>> +
>> +  $ hg remove -f 'glob:**.txt' -X capsdir1/capsdir
>> +  $ hg remove -f 'glob:**.txt' -I capsdir1/capsdir
>> +  removing CapsDir1/CapsDir/AbC.txt (glob)
>> +  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>  #endif
>>
>>    $ cd ..
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list