[PATCH 6 of 6] manifest: rename matches to _matches

Durham Goode durham at fb.com
Tue Mar 7 17:45:49 EST 2017



On 3/7/17 2:29 PM, Martin von Zweigbergk wrote:
> On Tue, Mar 7, 2017 at 9:37 AM, Durham Goode <durham at fb.com> wrote:
>>
>>
>> On 3/6/17 10:31 PM, Martin von Zweigbergk wrote:
>>>
>>> On Mon, Mar 6, 2017 at 5:40 PM, Durham Goode <durham at fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 3/6/17 5:32 PM, Martin von Zweigbergk wrote:
>>>>>
>>>>>
>>>>> On Fri, Mar 3, 2017 at 11:34 AM, Durham Goode <durham at fb.com> wrote:
>>>>>>
>>>>>>
>>>>>> # HG changeset patch
>>>>>> # User Durham Goode <durham at fb.com>
>>>>>> # Date 1488517595 28800
>>>>>> #      Thu Mar 02 21:06:35 2017 -0800
>>>>>> # Node ID 1bed8ee65389133923b9076909e04f66ef26f1b8
>>>>>> # Parent  207763e895c7d24885df22f5b9c0df5494d77daf
>>>>>> manifest: rename matches to _matches
>>>>>>
>>>>>> Now that there are no external consumers of manifest.matches, let's
>>>>>> rename it to
>>>>>> _matches so it remains internal.
>
> treemanifest already had a _matches(), so now it has two...
>
>>>>>> This means alternative manifest
>>>>>> implementations
>>>>>> no longer have to implement it, and can instead implement efficient
>>>>>> versions of
>>>>>> diff() and filesnotin().
>>>>>
>>>>>
>>>>>
>>>>> matches() still seems like a sensible method. Once "hg files" gains a
>>>>> "-v" or "--flags" option so it can really replace "hg manifest" (or if
>>>>> "hg manifest" started accepting a file pattern), it seems natural to
>>>>> implement that using manifest.matches(). Would it be a problem to
>>>>> queue patches 1-5 but not this one?
>>>>
>>>>
>>>>
>>>> It's not a problem to queue 1-5 without 6, but given the performance
>>>> criticalness of the manifest, I'd lean towards not giving it functions
>>>> that
>>>> could enable future bad algorithms.  If we need the ability to iterate
>>>> over
>>>> the entire manifest in the future, maybe we could add a
>>>> manifest.walk(match=None) function that returns an iterator.
>>>
>>>
>>> That already exists, but it walks the filenames only, not the
>>> (filename,node,flags) tuples :-) But that's besides the point, I know;
>>> we'd just need to find a different name.
>>>
>>>> The 'walk()'
>>>> part exposes caller to the fact that it is going to be an expensive
>>>> operation, and the iterator aspect means we don't have to allocate an
>>>> entire
>>>> dict and we don't have to worry about people abusing it as a way of doing
>>>> naive set math using manifests.  That gets the manifest a little closer
>>>> to
>>>> resembling a dirstate as well.
>>>
>>>
>>> That makes sense. However, there's no such warning to users who do
>>> *not* use a matcher. I think I took care of all those cases two years
>>> ago, but if we want to make it less likely that someone does
>>> set(mf.match(m)), then it seems even more important to prevent
>>> set(mf).  So isn't it __iter__ and friends we should make explicit?
>>
>>
>> Yep, we should probably head in the direction of making all full iteration
>> operations explicit.
>>
>>> I'd also like to be able to verify that this is going in the right
>>> direction. Could you share the remainder of the series too (as a
>>> link)? Thanks.
>>
>>
>> The only new commits on top of this so far are to change treemanifest.diff()
>> to actually limit it's diff based on the visitdir() responses of the
>> matcher.  You can see it here: https://urldefense.proofpoint.com/v2/url?u=https-3A__bpaste.net_show_cba5f57820f1&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=uvgL5NLjKD18j2hB6yeJEE-7YJg1wLpx_R1fsWaDkwg&s=oWl_cfGafh-_1LwCNrhAraVcMcVOLDKn3rcReKm5BY4&e=  (two
>> patches, one to update tests, one to adjust the diff implementation. I'm
>> working on migrating our internal treemanifest imlpementation to use this
>> pattern now, so I can get perf numbers. Then I'll come back and do the same
>> for treemanifest.filesnotin()
>
> I'd really like to see the performance gain from this series, since
> that's the motivating force behind it. I've spent some time trying it
> myself. I added these two lines in treemanifest.diff._diff():
>
>             if match and not match.visitdir(t1._dir or '.'):
>                 return
>
> IIUC, the point of the series is to make diffs and status calls
> (across revisions) faster when there are many matching files but a
> small diff. I tried "time hg st --rev .~1 --rev . js" on some
> arbitrary commit I picked from a treemanifest version of the Mozilla
> repo. That speeds up from 1.3s to 0.18s. Nice! It's such a simple fix
> to get that speedup, so can I get you to include it in the series? I
> would have preferred to see something like:

Is the treemanifest version of mozilla available somewhere? Or would I 
need to convert it?

I can attach my two remaining patches (tests + diff optimization) to 
this series if you want. The two lines you added aren't quite enough to 
be a complete patch (t1._dir isn't the full path, and we need to handle 
file matching as well).

> 1. add match argument to diff() and filesnotin() and optimize them
> 2. make copies use new manifest.diff() api and see how it's faster
> 3. same for merge
> 4. ...
>
> That would make it an easier sell for me. But again, what I cared most
> about was data to back up your claim, so I'd really like at least to
> have that part of the series.

I went with this approach because it put all the easy, 
non-logic-affecting changes at the front for easy queueing. That put the 
least blockers between the refactor getting into core and me being able 
to make the needed changes to our internal treemanifest to get perf 
numbers on our repo, since I don't have a large vanilla treemanifest to 
test against. I assumed the promised O() changes were sufficient to 
justify the refactor at least. Then I'd provide perf numbers when 
sending the optimization patches.

> Btw, an alternative approach would have been to make
> manifest.matches() lazy, but I think making it explicit like your
> series does is better.

I considered that, but came to the same conclusion. And the code for a 
lazy one would be a lot more intricate I think and I'd have to replicate 
that in our native treemanifest.


More information about the Mercurial-devel mailing list