Enriching a file log by branches, tags and bookmarks

Marc Strapetz marc.strapetz at syntevo.com
Wed Apr 8 09:04:58 CDT 2015


On 08.04.2015 02:01, Simon King wrote:
>> On 8 Apr 2015, at 00:25, Simon King <simon at simonking.org.uk> wrote:
>>
>>
>>> On 7 Apr 2015, at 14:55, Marc Strapetz <marc.strapetz at syntevo.com> wrote:
>>>
>>> I'm taking this thread from the main list to the developers list:
>>>
>>> I'm looking for a way to enrich a file log (or sub-tree log) by branches, tags and bookmarks: every such "tag" should be displayed at the "closest" commit of the file log, i.e. at that commit which contains the file in identical state (content).
>>>
>>> Here is an example of how this should look like for setup.py of the Mercurial repository (note that this repository is quite outdated, so most recent tags do not show up):
>>>
>>> http://i.imgur.com/AYt885I.png
>>>
>>> Using a loop of revset queries gives me the information I'm looking for, but is too inefficient (pseudo-code):
>>>
>>> $ for all $X in tags(): hg log -r "max(ancestors(tagged(X)) and
>>> file(setup.py))" -T "$X {rev}\n"
>>>
>>> For this approach, e.g. ancestors() as well as file(setup.py) is run for every tag (N times), while I'm having in mind a solution which would traverse the commit hierarchy only once or twice, possibly increasing memory usage by O(N) for the sake of constant running time. (Is that reasonable for large repositories, like for the CPython repository?)
>>>
>>> Unfortunately my Python and Mercurial knowledge are quite limited and hence I'm looking here for someone who is able to write an extension to solve this problem on a work-for-hire basis.
>>>
>>> -Marc
>>>
>>>
>>
>>
>> Below is an extension that horribly abuses revsets to do something like what you are asking for. I’ve only tested it with hg 3.2.3. I hope that with the lazy revset functions in recent versions of mercurial it is fairly efficient. In particular, that iterating over "sort(::n, -rev)” is an efficient way to travel down the ancestry for a revision.

Many thanks, Simon! This is basically what I'm looking for and running 
time with Mercurial 3.3 on the CPython repository is really good.

> Actually, on further testing I see this is buggy. I was trying to reproduce your screenshot but it is putting the 2.5-related tags on the same revision as the 2.4 ones.
>
> o    18900
> |\
> | \
> | |\
> | | \
> | | |\
> | | | \
> | | | |\
> +-+-+---o  18866 2.5.4
> | | | |/
> | | | o  18755 2.5.3
> | | | |
> +-+---o  18754
> | | |/
> +---o  17926
> | |/
> | o    17732 2.4-rc 2.4 2.4.1 2.4.2 2.5-rc 2.5 2.5.1 2.5.2

I can reproduce that. I think the core problem is that the caching 
assumes a total order between all revisions. I've changed code now to 
replace 'sort(::%d, -rev)' by 'sort(ancestors(%d), -rev)', however still 
2.5-rc is mapped to r17732 instead of r18754 due to the merge at r18543.

I currently have no idea how to fix the algorithm to work correctly 
here, though that would be desirable because it's fast and memory-efficient.

A less memory-efficient approach which I'll try to implement will be 
traversing the entire commit network and shifting tags from a revision 
to all of its ancestors. Once a commit of the file's log is encountered 
tags will be reported and shifting will be stopped for these tags. This 
might result in large sets of tags being applied to many nodes, possibly 
requiring large amounts of memory (though I'm not yet sure on that).

Either way, I'll come back with a modified version of your extension 
once it's working.

-Marc










More information about the Mercurial-devel mailing list