[PATCH 2 of 8] transaction: track newly introduced revisions

Wed May 3 20:03:30 EDT 2017

On 05/03/2017 07:24 PM, Jun Wu wrote:
> Excerpts from Pierre-Yves David's message of 2017-05-03 14:49:53 +0200:
>>> Since changelog is append-only, and strip won't happen during a transaction.
>>> Why not just record len(changelog) during transaction start, and test
>>> len(changelog) at the end of transaction?
>>
>> Mostly for simplicity, having the pure raw information on the
>> transaction is simple and cheap. And it gives us access to all kind of
>> native operation directly (len, membership, iteration) without the needs
>> to perform the same (simple) operation all over the place. It also allow
>> user code to operation without having a changelog object, another small win.
>>
>> User-code in this series are fairly simple, but we'll have smarter and
>> smarter usage of these data as more code uses it.
>
> The above are still abstract to me. Could you give some examples about the
> future usage of the new information (in a short way) ?

Current planned foreseen usages are:

* detecting that new revs have been added, (__nonzero__),
* pre-sizing various cache update with the number of changeset (__len__)
* feeding new revision to cache update function and hooks (__iter__),
* issuing message about new changesets (__len__),
* checking is special property node (head) are been newly added 
(__contains__)
(There will probably be more)

Sure, We could derive all that data from a min/max pair. However this 
means every single user of that data will have to redo the same 
arithmetic all over again. This is error prone, tedious and ineffective
(the code-base is already suffering from this in multiple places).

In addition, 'tr.changes' will contains other types of data "in-full" 
(bookmark movements, tag movements, phase changes, new obs-markers, 
etc…). So having the 'revs' information 'in-full' too is more consistent.

>> I'm not worried about the memory usage since many other place assumes we
>> can keep all repository revs in memory.
>
> Yes. But I don't think "this is already O(N)" is a good reason to add
> another "O(N)" thing given the fact we know there is an easy "O(1)"
> alternative.

If the basic approach (native set) approach turns out to be an issue, we 
can turn it into something smarter (eg: something based on 
smartset.spanset), but for now I see no point in blocking the current 
series and making the approach more complex. Especially when they are no 
clear evidence it will matters..

>>> That's O(1), instead of O(N log N)
>>
>> (note: I'm not sure what type of complexity you refer to, nor what is
>> the operation you are considering to get theses number)
>
> Both time and memory.

(not: here memory in O(N))

> If I pulled N revisions, this series is building a set
> with N elements by calling set.add N times.

If you pull N revisions, You are adding at least N new revlog entries + 
over implied computation on these N new revisions (eg: cache update) .
The time spend doing 'set.add' will be negligible in regard with the above.

> The set is only used to detect
> new revisions added to changelog, which could be represented as an O(1)
> "xrange" by recording changlog length at the start and end of transaction
> (which has an O(1) time complexity).

You seems to be missing the goal of 'tr.changes'. Its goal it to 
contains an accurate record of the changes made by a transaction. The 
current user is simple, because we start simple.

Cheers,

-- 
Pierre-Yves David