[PATCH 3 of 7] parsers: avoid PyList_Append when parsing obs markers

Thu Mar 16 13:12:11 EDT 2017

> On Mar 16, 2017, at 10:09, Gregory Szorc <gregory.szorc at gmail.com> wrote:
> 
>> On Thu, Mar 16, 2017 at 9:50 AM, Augie Fackler <raf at durin42.com> wrote:
>> On Mon, Mar 13, 2017 at 10:24:21PM -0700, Gregory Szorc wrote:
>> > On Mon, Mar 13, 2017 at 10:15 PM, Gregory Szorc <gregory.szorc at gmail.com>
>> > wrote:
>> > Assuming there is some truth to the numbers, holy cow the overhead of
>> > PyObject creation is crazy. I suddenly feel like auditing the C code for
>> > everywhere we're instantiating PyObjects before it is necessary. I know we
>> > do this in the revlog index. I wonder if dirstate is impacted...
>> 
>> Yeah, PyObject allocation can be pretty nasty. In my toy rust version
>> of fm1readmarkers, it's 100x faster than the C code was, then 3x
>> slower overall due to time spent copying data into PyObjects.
>> 
> !!!

Yeah, the actual parsing step was so fast I almost didn't believe it. Zero-copy parsers are pretty great.

> 
> If it is 3x slower in the end, I highly suspect something sub-optimal with the Rust -> PyObject code path. My guess would be having to allocate memory and copy data for the hashes into PyBytes instances. We can leverage memoryview or any object implementing the buffer protocol so we can avoid that copy. You still pay for the PyObject overhead. But at least you aren't making extra copies of data.

My suspicion is that it's all the string copies, yeah. I also have wondered about the bindings I'm using doing something suboptimal with the GIL, but I haven't had time to wade through it or try and profile it.

> 
> Also, I'm really curious when obs markers actually need to be realized. There is validation after parsing that successors aren't nullid. That can easily be inlined into C or deferred until first access. In theory, we could create our own Python type behaving as a sequence that turned C structs to PyObjects on demand. I'm just not sure if that buys us anything. The second you iterate over markers from Python, you throw away that optimization. 
> 
> Perhaps it would be better for the obs marker API to conceptually be a black box store that exposed primitives like "find X for <set>." That way, all the expensive computation can be done in C on raw data structures and we only realize PyObjects as needed. For cases where you are dealing with lists of fixed-width values, you could return an array of packed data instead of N PyObjects. Come to think of it, the computation of just hidden nodes might be the only super critical part to be all in Python. If we can do that in C without realizing tons PyObjects, I think a lot of read-only operations/commands just got a lot faster.

If we build that interface, the rust code I've got sitting around (using nom) is probably actually worth exploring in more detail.