children, n^2 and solutions?

Mon Jul 26 11:58:21 CDT 2010

On Jul 26, 2010, at 4:10 AM, Matt Mackall wrote:

> On Mon, 2010-07-26 at 03:13 +0200, Jason Harris wrote:
>> So I have looked into trying to make sure I do look aheads and caching
>> at the right time, and then back looking, and then graph extensions
>> when things change, and then redoing the graph extension layouts, and
>> lots more details... and well its pretty painful... So at this stage I
>> am wondering if it is possible to make Mercurial store the child
>> information in the revlog or internally somewhere in a persistent way?
>> Would this be insanely difficult?
> 
> It's actually impossible. All pointers in revlogs must point into the
> past rather than the future. We could have a bag on the side to store
> such things, but that'd be ugly and we internally have no need for it.
> 
> It should be a pretty simple matter to cache it all on your side. You
> can, for instance, read out the complete history graph all in one go
> like this:
> 
> hg debugindex .hg/store/00changelog.i
> 
> And there's a compact, faster form that works like this (1.6+):
> 
> hg debugdag
> 
> This'll dump the graph of my kernel repo (201733 csets) in less than
> three seconds. See hg debugbuilddag for parsing info.

Thanks for the information!! Its interesting! Out of interest can the people that use debugdag speak up? Ie does TortiseHg use it, or other tools. I am interested in how they use it...

Fortunately / unfortunately MacHg works incrementally. That means if I do hg log -r 200100:200200 it will give me the results about an order of magnitude faster than hg debugdag. So if I were to use it I would have to do the log command (to get the information to display to the users) and then the debugdag command as well to get the structure. Additionally the Tree can change at many hg instructions. Eg pull, merge, strip, edit, collapse, etc can cause the tree to change, so really a full update needs to be done on many commands...

Thus, hitting the users with a 3 second delay after most commands that interact with the repository would not be good... With the way I do things incrementally the users views are updated smoothly after the .3 seconds or so that the log command takes on such a large repository (Actually it would be good if this was even faster of course :) ) Of course I could imagine even more complex caching mechanisms of getting the debugdag then getting a new dag, then merging the new dag into the existing dag and updating accordingly, etc...

However in terms of esthetics and simplicity alone, cleanly interfacing Mercurial to the outside world would just seem simpler if log could access the children in a O(n) or O(1) operation.

When you say you could "We could have a bag on the side to store such things, but that'd be ugly and we internally have no need for it." A couple of questions:
1. Would such a bag be a persistent object so that given a revision number then determining the children could be O(1) when issued from the command eg hg log -r 200:270 --template "{children}" should be around 70 * O(1)?
2. Would this be possible to add as an extension or would it be too ingrained in the internals to really nicely separate out?
3. If it had to be integrated into the internals would this slow things down by more than 1 or 2% for normal operations?
4. Just to give me an idea and if its possible would you have any sense of how difficult this would be to do on the Mercurial side for someone like you or Martin or any of the other crew members? Ie like a couple hours, or maybe a day, or maybe a week, or "a long time". I am just trying to get some sense of the problem...

Just to summarize I can likely get around this problem in MacHg in an ugly way that will likely cause me troubles and be a source of bugs here and there and small glitches and side cases I am papering over, etc...  but with maybe another days work I can get something reasonably stable up and going. I had just thought though that I would not be the only one facing this problem and it would be nice to have central Mercurial be able to nicely report the children in a non O(n^2) manner.

But thanks again for the detailed comments! I will forge ahead with trying to cache things on my end (MacHg) and investigate debugdag more. Of course my humble request to be able to use children in log in O(1) or O(n) would be immensely useful to me. If somehow someone finds it in their harts to put this on their schedules I would be extremely grateful! Thank you!

Cheers,
  Jas