[PATCH 6 of 6] manifest: rename matches to _matches

Martin von Zweigbergk martinvonz at google.com
Tue Mar 7 17:29:10 EST 2017


On Tue, Mar 7, 2017 at 9:37 AM, Durham Goode <durham at fb.com> wrote:
>
>
> On 3/6/17 10:31 PM, Martin von Zweigbergk wrote:
>>
>> On Mon, Mar 6, 2017 at 5:40 PM, Durham Goode <durham at fb.com> wrote:
>>>
>>>
>>>
>>> On 3/6/17 5:32 PM, Martin von Zweigbergk wrote:
>>>>
>>>>
>>>> On Fri, Mar 3, 2017 at 11:34 AM, Durham Goode <durham at fb.com> wrote:
>>>>>
>>>>>
>>>>> # HG changeset patch
>>>>> # User Durham Goode <durham at fb.com>
>>>>> # Date 1488517595 28800
>>>>> #      Thu Mar 02 21:06:35 2017 -0800
>>>>> # Node ID 1bed8ee65389133923b9076909e04f66ef26f1b8
>>>>> # Parent  207763e895c7d24885df22f5b9c0df5494d77daf
>>>>> manifest: rename matches to _matches
>>>>>
>>>>> Now that there are no external consumers of manifest.matches, let's
>>>>> rename it to
>>>>> _matches so it remains internal.

treemanifest already had a _matches(), so now it has two...

>>>>> This means alternative manifest
>>>>> implementations
>>>>> no longer have to implement it, and can instead implement efficient
>>>>> versions of
>>>>> diff() and filesnotin().
>>>>
>>>>
>>>>
>>>> matches() still seems like a sensible method. Once "hg files" gains a
>>>> "-v" or "--flags" option so it can really replace "hg manifest" (or if
>>>> "hg manifest" started accepting a file pattern), it seems natural to
>>>> implement that using manifest.matches(). Would it be a problem to
>>>> queue patches 1-5 but not this one?
>>>
>>>
>>>
>>> It's not a problem to queue 1-5 without 6, but given the performance
>>> criticalness of the manifest, I'd lean towards not giving it functions
>>> that
>>> could enable future bad algorithms.  If we need the ability to iterate
>>> over
>>> the entire manifest in the future, maybe we could add a
>>> manifest.walk(match=None) function that returns an iterator.
>>
>>
>> That already exists, but it walks the filenames only, not the
>> (filename,node,flags) tuples :-) But that's besides the point, I know;
>> we'd just need to find a different name.
>>
>>> The 'walk()'
>>> part exposes caller to the fact that it is going to be an expensive
>>> operation, and the iterator aspect means we don't have to allocate an
>>> entire
>>> dict and we don't have to worry about people abusing it as a way of doing
>>> naive set math using manifests.  That gets the manifest a little closer
>>> to
>>> resembling a dirstate as well.
>>
>>
>> That makes sense. However, there's no such warning to users who do
>> *not* use a matcher. I think I took care of all those cases two years
>> ago, but if we want to make it less likely that someone does
>> set(mf.match(m)), then it seems even more important to prevent
>> set(mf).  So isn't it __iter__ and friends we should make explicit?
>
>
> Yep, we should probably head in the direction of making all full iteration
> operations explicit.
>
>> I'd also like to be able to verify that this is going in the right
>> direction. Could you share the remainder of the series too (as a
>> link)? Thanks.
>
>
> The only new commits on top of this so far are to change treemanifest.diff()
> to actually limit it's diff based on the visitdir() responses of the
> matcher.  You can see it here: https://bpaste.net/show/cba5f57820f1 (two
> patches, one to update tests, one to adjust the diff implementation. I'm
> working on migrating our internal treemanifest imlpementation to use this
> pattern now, so I can get perf numbers. Then I'll come back and do the same
> for treemanifest.filesnotin()

I'd really like to see the performance gain from this series, since
that's the motivating force behind it. I've spent some time trying it
myself. I added these two lines in treemanifest.diff._diff():

            if match and not match.visitdir(t1._dir or '.'):
                return

IIUC, the point of the series is to make diffs and status calls
(across revisions) faster when there are many matching files but a
small diff. I tried "time hg st --rev .~1 --rev . js" on some
arbitrary commit I picked from a treemanifest version of the Mozilla
repo. That speeds up from 1.3s to 0.18s. Nice! It's such a simple fix
to get that speedup, so can I get you to include it in the series? I
would have preferred to see something like:

1. add match argument to diff() and filesnotin() and optimize them
2. make copies use new manifest.diff() api and see how it's faster
3. same for merge
4. ...

That would make it an easier sell for me. But again, what I cared most
about was data to back up your claim, so I'd really like at least to
have that part of the series.

Btw, an alternative approach would have been to make
manifest.matches() lazy, but I think making it explicit like your
series does is better.


More information about the Mercurial-devel mailing list