[PATCH 11 of 11 sparse] dirstate: integrate sparse matcher with _ignore (API)

Durham Goode durham at fb.com
Tue Jul 11 13:23:24 EDT 2017



On 7/11/17 10:16 AM, Martin von Zweigbergk wrote:
> On Mon, Jul 10, 2017 at 2:48 PM, Durham Goode <durham at fb.com> wrote:
>>
>>
>> On 7/10/17 1:04 PM, Martin von Zweigbergk wrote:
>>>
>>> On Mon, Jul 10, 2017 at 11:58 AM, Durham Goode <durham at fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/10/17 11:55 AM, Martin von Zweigbergk wrote:
>>>>>
>>>>>
>>>>> On Mon, Jul 10, 2017 at 11:45 AM, Durham Goode <durham at fb.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 7/10/17 10:01 AM, Martin von Zweigbergk wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (For Durham)
>>>>>>>
>>>>>>> On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc
>>>>>>> <gregory.szorc at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> # HG changeset patch
>>>>>>>> # User Gregory Szorc <gregory.szorc at gmail.com>
>>>>>>>> # Date 1499555309 25200
>>>>>>>> #      Sat Jul 08 16:08:29 2017 -0700
>>>>>>>> # Node ID 94f98bc84936defadb959e31012555dba170d8cd
>>>>>>>> # Parent  a2867557f9c2314aeea19a946dfb8e167def4fb8
>>>>>>>> dirstate: integrate sparse matcher with _ignore (API)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Why does sparse do it this way instead of intersecting the sparse
>>>>>>> matcher with the user's matcher?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm not sure I understand the question.  What is the "user's matcher"
>>>>>> here?
>>>>>> The ignore matcher?
>>>>>
>>>>>
>>>>>
>>>>> I mean the matcher the user may have provided on the command line (or
>>>>> match.always() by default), as in "hg status dir/" (where the matcher
>>>>> would be "relpath:dir").
>>>>>
>>>>>>
>>>>>> This code produces a matcher that returns true for any file that should
>>>>>> be
>>>>>> ignored.  Since both hgignore files and sparse-ignored files should be
>>>>>> ignored, I'm not sure how that could be expressed with intersection of
>>>>>> those
>>>>>> two matchers?
>>>>>
>>>>>
>>>>>
>>>>> For narrowhg, we did it the other way around: filtering in instead of
>>>>> filtering out. So if the narrowspec (like sparse config, IIUC) says to
>>>>> include foo/ and bar/ and the user provides 'glob:*c', we'd intersect
>>>>> that and list *.c files in those two directories (recursively).
>>>>
>>>>
>>>>
>>>> I'd have to look at the code to be specific, but I think the dirstate
>>>> ignore
>>>> logic covers more cases than the user provided matcher logic. I'd be
>>>> surprised if all commands that hit dirstate.ignore also happened to take
>>>> patterns at the command level.
>>>
>>>
>>> If they don't, then the sparse matcher can be passed as is.
>>>
>>>>  It just seemed cleaner to have a unified
>>>> matcher for ignored files at the repo level.  The user specific matcher
>>>> can
>>>> always be added on top of it later for commands that take patterns.
>>>
>>>
>>> For narrow, we have to apply the matcher when walking the manifest
>>> too. The user can pass a matcher to e.g. "hg status -c ." or "hg files
>>> -r ." and in those cases we need to intersect the narrow matcher with
>>> the user-supplied one. It seemed more natural to do the same for
>>> dirstate walks.
>>>
>>> It also seems simpler to control which directories are visited if
>>> using a positive matcher than a negative one. For example, let's say
>>> the narrow matcher is path:dir/. The narrowhg code will then restrict
>>> the walk to visit only the root directory, dir/, and subdirectories of
>>> dir/ (both for manifest walks and dirstate walks). I think we can
>>> simply make negatematcher's visitdir return False iff the
>>> narrow/sparse matcher returns 'all', so it's probably easy to get it
>>> to work. It still seems more natural to me to match what should be
>>> included.
>>>
>>
>> I don't have a strong opinion either way. When I made sparse, it was
>> specific to the working copy, so it mapped to the ignore matcher very
>> tightly. If that needs to change, that's fine.
>
> I just tried this. I think the result is cleaner, but there's a
> functional change that perhaps suggests the reason you modified the
> ignores instead of modifying the walk: untracked files outside the
> sparse config are reported as ignored with your version and are not
> reported at all with my version. That's an interesting aspect that I
> had not thought about.
>
> I had previously considered adding an option to status for narrowhg to
> make it also list files outside the narrow config. One option would be
> to not include them by default and list their status as something
> special (maybe O for outside?) since neither untracked nor ignored is
> really accurate. They may (kind of) be tracked, just not in the
> working copy, and with narrowhg with treemanifests we can't even tell
> if they're in the manifest or not.
>
> So, would you be okay with changing the behavior to not report these
> files as ignored by default and instead adding an option to status
> (maybe --no-sparse)?

If that's the only effect, I'm fine with that change.  The only time we 
care about the files outside the sparse checkout is when widening the 
sparse checkout (it checks if files already exist on disk and prevents 
the sparse-widening from overwriting them if the content differs), but 
that might be done via other means.


More information about the Mercurial-devel mailing list