Enriching a file log by branches, tags and bookmarks

Marc Strapetz marc.strapetz at syntevo.com
Wed Apr 8 14:59:14 CDT 2015


On 08.04.2015 16:04, Marc Strapetz wrote:
> On 08.04.2015 02:01, Simon King wrote:
>>> On 8 Apr 2015, at 00:25, Simon King <simon at simonking.org.uk> wrote:
>>>
>>>
>>>> On 7 Apr 2015, at 14:55, Marc Strapetz <marc.strapetz at syntevo.com>
>>>> wrote:
>>>>
>>>> I'm taking this thread from the main list to the developers list:
>>>>
>>>> I'm looking for a way to enrich a file log (or sub-tree log) by
>>>> branches, tags and bookmarks: every such "tag" should be displayed
>>>> at the "closest" commit of the file log, i.e. at that commit which
>>>> contains the file in identical state (content).
>>>>
>>>> Here is an example of how this should look like for setup.py of the
>>>> Mercurial repository (note that this repository is quite outdated,
>>>> so most recent tags do not show up):
>>>>
>>>> http://i.imgur.com/AYt885I.png
>>>>
>>>> Using a loop of revset queries gives me the information I'm looking
>>>> for, but is too inefficient (pseudo-code):
>>>>
>>>> $ for all $X in tags(): hg log -r "max(ancestors(tagged(X)) and
>>>> file(setup.py))" -T "$X {rev}\n"
>>>>
>>>> For this approach, e.g. ancestors() as well as file(setup.py) is run
>>>> for every tag (N times), while I'm having in mind a solution which
>>>> would traverse the commit hierarchy only once or twice, possibly
>>>> increasing memory usage by O(N) for the sake of constant running
>>>> time. (Is that reasonable for large repositories, like for the
>>>> CPython repository?)
>>>>
>>>> Unfortunately my Python and Mercurial knowledge are quite limited
>>>> and hence I'm looking here for someone who is able to write an
>>>> extension to solve this problem on a work-for-hire basis.
>>>>
>>>> -Marc
>>>>
>>>>
>>>
>>>
>>> Below is an extension that horribly abuses revsets to do something
>>> like what you are asking for. I’ve only tested it with hg 3.2.3. I
>>> hope that with the lazy revset functions in recent versions of
>>> mercurial it is fairly efficient. In particular, that iterating over
>>> "sort(::n, -rev)” is an efficient way to travel down the ancestry for
>>> a revision.
>
> Many thanks, Simon! This is basically what I'm looking for and running
> time with Mercurial 3.3 on the CPython repository is really good.
>
>> Actually, on further testing I see this is buggy. I was trying to
>> reproduce your screenshot but it is putting the 2.5-related tags on
>> the same revision as the 2.4 ones.
>>
>> o    18900
>> |\
>> | \
>> | |\
>> | | \
>> | | |\
>> | | | \
>> | | | |\
>> +-+-+---o  18866 2.5.4
>> | | | |/
>> | | | o  18755 2.5.3
>> | | | |
>> +-+---o  18754
>> | | |/
>> +---o  17926
>> | |/
>> | o    17732 2.4-rc 2.4 2.4.1 2.4.2 2.5-rc 2.5 2.5.1 2.5.2
>
> I can reproduce that. I think the core problem is that the caching
> assumes a total order between all revisions. I've changed code now to
> replace 'sort(::%d, -rev)' by 'sort(ancestors(%d), -rev)', however still
> 2.5-rc is mapped to r17732 instead of r18754 due to the merge at r18543.
>
> I currently have no idea how to fix the algorithm to work correctly
> here, though that would be desirable because it's fast and
> memory-efficient.
>
> A less memory-efficient approach which I'll try to implement will be
> traversing the entire commit network and shifting tags from a revision
> to all of its ancestors. Once a commit of the file's log is encountered
> tags will be reported and shifting will be stopped for these tags. This
> might result in large sets of tags being applied to many nodes, possibly
> requiring large amounts of memory (though I'm not yet sure on that).
>
> Either way, I'll come back with a modified version of your extension
> once it's working.

I've tried to implement this approach for two different kinds of 
extensions: the log-template extension (pushtags2.py) as in your 
original version and an extension which simply prints out revision and 
assigned tags (which might be parsed by the calling program, 
pushtags3.py). pushtags3.py would be called as:

$ hg pushtags setup.py

Basically, both extensions, especially the algorithm, is more or less 
identical. Interestingly, pushtags3.py is significantly faster than 
pushtags2.py.

Contrary to my former concerns, memory usage seems to be no problem 
because the revtotags can easily be kept small (revtotags.pop(rev, None)).

-Marc



# --- pushtags2.py ---------------------------

from mercurial import revset, util, templatekw

def pushtags(repo, subset, x):
     s = revset.getset(repo, subset, x)
     cl = repo.changelog
     tags = [(cl.rev(node), tag) for (tag, node) in repo.tags().items()]
     revtotags = {}
     for rev, tag in tags:
         revtotags.setdefault(rev, set()).add(tag)

     tip = cl.rev(cl.tip())
     repo._pushedtags = {}
     processedtags = set()
     for rev in reversed(range(0, tip)):
         revtags = revtotags.get(rev, set())
         if rev in s:
             repo._pushedtags[rev] = list(revtags.difference(processedtags))
             processedtags = processedtags.union(revtags)
         elif len(revtags) > 0:
             for parent in cl.parentrevs(rev):
                 revtotags.setdefault(parent, set())
                 revtotags[parent] = 
revtags.union(revtotags[parent]).difference(processedtags)
         revtotags.pop(rev, None)

     return s

revset.symbols.update({'pushtags': pushtags})

def pushedtagskw(**kwargs):
     repo, ctx = kwargs['repo'], kwargs['ctx']
     pushedtags = getattr(repo, '_pushedtags', None)
     if pushedtags is None:
         raise util.Abort('pushedtags template keyword requires '
                          'pushtags revset')

     tags = pushedtags.get(ctx.rev(), [])
     if tags:
         return templatekw.showlist('pushedtag', tags, **kwargs)

templatekw.keywords['pushedtags'] = pushedtagskw

# --- pushtags3.py ---------------------------

from mercurial import revset, util, templatekw

def pushtags(ui, repo, path, **opts):
     s = revset.follow(repo, revset.spanset(repo), ('symbol', path))
     cl = repo.changelog
     tags = [(cl.rev(node), tag) for (tag, node) in repo.tags().items()]
     revtotags = {}
     for rev, tag in tags:
         revtotags.setdefault(rev, set()).add(tag)

     tip = cl.rev(cl.tip())
     pushedtags = {}
     processedtags = set()
     for rev in reversed(range(0, tip)):
         revtags = revtotags.get(rev, set())
         if rev in s:
             finaltags = revtags.difference(processedtags)
             if len(finaltags) > 0:
                 pushedtags[rev] = revtags.difference(processedtags)
             processedtags = processedtags.union(revtags)
         elif len(revtags) > 0:
             for parent in cl.parentrevs(rev):
                 revtotags.setdefault(parent, set())
                 revtotags[parent] = 
revtags.union(revtotags[parent]).difference(processedtags)
         revtotags.pop(rev, None)

     for rev in reversed(sorted(pushedtags)):
         ui.write(rev, ': ', sorted(pushedtags[rev]), '\n')

cmdtable = {
     'pushtags': (pushtags, [], '[options] FILE')
}


More information about the Mercurial-devel mailing list