[PATCH STABLE?] tags: create new sortdict for performance reasons

Gregory Szorc gregory.szorc at gmail.com
Sat Nov 21 18:18:13 CST 2015


On Sat, Nov 21, 2015 at 2:06 PM, Augie Fackler <raf at durin42.com> wrote:

> On Thu, Nov 12, 2015 at 01:17:50PM -0800, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1447362964 28800
> > #      Thu Nov 12 13:16:04 2015 -0800
> > # Node ID c541bc4fc379f76d0cf30354942f78aaddd26f04
> > # Parent  150fb782ef80919e1c8cc7d4b047d78f889c7846
> > tags: create new sortdict for performance reasons
>
> squinting at this lightly, I suspect sortdict could go away in favor
> of collections.ordereddict now...
>

Requires 2.7 :/


>
> >
> > sortdict internally maintains a list of keys in insertion order. When a
> > key is replaced via __setitem__, we .remove() from this list. This
> > involves a linear scan and array adjustment. This is an expensive
> > operation.
> >
> > The tags reading code was calling into sortdict.__setitem__ for each tag
> > in a read .hgtags revision. For repositories with thousands of tags or
> > thousands of .hgtags revisions, the overhead from list.remove()
> > noticeable.
> >
> > This patch creates a new sortdict() so __setitem__ calls don't incur a
> > list.remove.
> >
> > This doesn't appear to have any performance impact on my Firefox
> > repository. But that's only because tags reading doesn't show up in
> > profiles to begin with. I'm still waiting to hear from a user with over
> > 10,000 tags and hundreds of heads on the impact of this patch.
> >
> > diff --git a/mercurial/tags.py b/mercurial/tags.py
> > --- a/mercurial/tags.py
> > +++ b/mercurial/tags.py
> > @@ -220,11 +220,15 @@ def _readtags(ui, repo, lines, fn, recod
> >      All node ids are binary, not hex.
> >      '''
> >      filetags, nodelines = _readtaghist(ui, repo, lines, fn,
> recode=recode,
> >                                         calcnodelines=calcnodelines)
> > +    # util.sortdict().__setitem__ is much slower at replacing then
> inserting
> > +    # new entries. The difference can matter if there are thousands of
> tags.
> > +    # Create a new sortdict to avoid the performance penalty.
> > +    newtags = util.sortdict()
> >      for tag, taghist in filetags.items():
> > -        filetags[tag] = (taghist[-1], taghist[:-1])
> > -    return filetags
> > +        newtags[tag] = (taghist[-1], taghist[:-1])
> > +    return newtags
> >
> >  def _updatetags(filetags, tagtype, alltags, tagtypes):
> >      '''Incorporate the tag info read from one file into the two
> >      dictionaries, alltags and tagtypes, that contain all tag
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at selenic.com
> > https://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151121/4dfed33a/attachment.html>


More information about the Mercurial-devel mailing list