[PATCH 1 of 4 lazy-manifest v2] manifest.c: new extension code to lazily parse manifests

Tue Jan 13 23:42:27 CST 2015

On Tue, Jan 13, 2015 at 6:40 PM, Augie Fackler <raf at durin42.com> wrote:
>
> On Jan 13, 2015, at 9:35 PM, Martin von Zweigbergk <martinvonz at google.com> wrote:
>
>> If updates are in bulk without queries in between, it should be possible to just append the updates to the list. Then, on query, sort with a stable sort, pick the last entry for each file and drop the rest. Makes sense?
>
> Yes, but how do we tell if the code is doing something like
>
> m[‘foo’] = node1
> m[‘foo’] = node2
>
> which would fail if we inserted ‘foo’ on the first call and then didn’t manage to realize we already had an (unsorted) line for foo, but still appears to be just insertions.

I'm not sure I follow, but let me explain what I meant. Let's say we
initially have:

bar=node3
foo=node0
qux=node4

Then your two __setitem__ calls happen and we have

bar=node3
foo=node0
qux=node4
foo=node1
foo=node2

On query (__iter__, __getitem__, __contains__), we see that the dirty
bit is set and we first sort:

bar=node3
foo=node0
foo=node1
foo=node2
qux=node4

Then we keep only the last entry for each file:

bar=node3
foo=node2
qux=node4

This seems to be working to me. A problem I can see with this way of
imitating the dict interface is that way would not fail on "del
m['baz']" even though that would fail on a dict. Doesn't seems like
much of a problem in practice, but I may be wrong.

Either way, do we know that updates generally are not interleaved with
queries? I haven't bothered checking. If they are interleaved, perhaps
we could rewrite them to be less so.

>
> If we have a bulk-insert code path, we can avoid that part of the problem.
>
>>
>> If queries are interleaved, we would have a problem anyway and would have to do something smarter.
>>
>>
>> On Tue, Jan 13, 2015, 18:24 Augie Fackler <raf at durin42.com> wrote:
>>
>> On Jan 13, 2015, at 9:22 PM, Martin von Zweigbergk <martinvonz at google.com> wrote:
>>
>> >
>> >
>> > On Tue, Jan 13, 2015, 18:11 Augie Fackler <raf at durin42.com> wrote:
>> >
>> >
>> > On Jan 13, 2015, at 7:42 PM, Martin von Zweigbergk <martinvonz at google.com> wrote:
>> >
>> > > As I reported on IRC, the 'setitem' method seems slow. On the Mozilla repo, running
>> > >
>> > >   hg co -C 1813b && hg mv -q intl i18n && time hg ci -qm 'move intl'
>> > >
>> > > takes 6.3s with current hg and 1m21s with these patches applied. It may very be related to your TODO in setitem.
>> >
>> > Yeah, I think this is the point where we should drop v2, and I’ll work on a bulk-add function (shouldn’t take long) to mitigate that problem.
>> >
>> >
>> >
>> > I think I once suggested making it transparent to the user by instead storing modifications separately and then compacting them on any query operation. That should effectively be the same (performance-wise) as an explicit bulk update operation, but without the need to update any callers.
>>
>> But if someone inserts a new entry, then does it again, the binary search won’t work, which makes things frustrating. If you’ve got an idea for that, I’d love to hear it though (I haven’t come up with anything better.)
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
>