[PATCH STABLE?] tags: create new sortdict for performance reasons

Augie Fackler raf at durin42.com
Sat Nov 21 18:19:16 CST 2015


On Nov 21, 2015 19:18, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:
>
> On Sat, Nov 21, 2015 at 2:06 PM, Augie Fackler <raf at durin42.com> wrote:
>>
>> On Thu, Nov 12, 2015 at 01:17:50PM -0800, Gregory Szorc wrote:
>> > # HG changeset patch
>> > # User Gregory Szorc <gregory.szorc at gmail.com>
>> > # Date 1447362964 28800
>> > #      Thu Nov 12 13:16:04 2015 -0800
>> > # Node ID c541bc4fc379f76d0cf30354942f78aaddd26f04
>> > # Parent  150fb782ef80919e1c8cc7d4b047d78f889c7846
>> > tags: create new sortdict for performance reasons
>>
>> squinting at this lightly, I suspect sortdict could go away in favor
>> of collections.ordereddict now...
>
>
> Requires 2.7 :/

Welp.

>
>>
>>
>> >
>> > sortdict internally maintains a list of keys in insertion order. When a
>> > key is replaced via __setitem__, we .remove() from this list. This
>> > involves a linear scan and array adjustment. This is an expensive
>> > operation.
>> >
>> > The tags reading code was calling into sortdict.__setitem__ for each
tag
>> > in a read .hgtags revision. For repositories with thousands of tags or
>> > thousands of .hgtags revisions, the overhead from list.remove()
>> > noticeable.
>> >
>> > This patch creates a new sortdict() so __setitem__ calls don't incur a
>> > list.remove.
>> >
>> > This doesn't appear to have any performance impact on my Firefox
>> > repository. But that's only because tags reading doesn't show up in
>> > profiles to begin with. I'm still waiting to hear from a user with over
>> > 10,000 tags and hundreds of heads on the impact of this patch.
>> >
>> > diff --git a/mercurial/tags.py b/mercurial/tags.py
>> > --- a/mercurial/tags.py
>> > +++ b/mercurial/tags.py
>> > @@ -220,11 +220,15 @@ def _readtags(ui, repo, lines, fn, recod
>> >      All node ids are binary, not hex.
>> >      '''
>> >      filetags, nodelines = _readtaghist(ui, repo, lines, fn,
recode=recode,
>> >                                         calcnodelines=calcnodelines)
>> > +    # util.sortdict().__setitem__ is much slower at replacing then
inserting
>> > +    # new entries. The difference can matter if there are thousands
of tags.
>> > +    # Create a new sortdict to avoid the performance penalty.
>> > +    newtags = util.sortdict()
>> >      for tag, taghist in filetags.items():
>> > -        filetags[tag] = (taghist[-1], taghist[:-1])
>> > -    return filetags
>> > +        newtags[tag] = (taghist[-1], taghist[:-1])
>> > +    return newtags
>> >
>> >  def _updatetags(filetags, tagtype, alltags, tagtypes):
>> >      '''Incorporate the tag info read from one file into the two
>> >      dictionaries, alltags and tagtypes, that contain all tag
>> > _______________________________________________
>> > Mercurial-devel mailing list
>> > Mercurial-devel at selenic.com
>> > https://selenic.com/mailman/listinfo/mercurial-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151121/319308ac/attachment.html>


More information about the Mercurial-devel mailing list