[PATCH STABLE?] tags: create new sortdict for performance reasons

Augie Fackler raf at durin42.com
Sat Nov 21 16:06:33 CST 2015


On Thu, Nov 12, 2015 at 01:17:50PM -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1447362964 28800
> #      Thu Nov 12 13:16:04 2015 -0800
> # Node ID c541bc4fc379f76d0cf30354942f78aaddd26f04
> # Parent  150fb782ef80919e1c8cc7d4b047d78f889c7846
> tags: create new sortdict for performance reasons

squinting at this lightly, I suspect sortdict could go away in favor
of collections.ordereddict now...

>
> sortdict internally maintains a list of keys in insertion order. When a
> key is replaced via __setitem__, we .remove() from this list. This
> involves a linear scan and array adjustment. This is an expensive
> operation.
>
> The tags reading code was calling into sortdict.__setitem__ for each tag
> in a read .hgtags revision. For repositories with thousands of tags or
> thousands of .hgtags revisions, the overhead from list.remove()
> noticeable.
>
> This patch creates a new sortdict() so __setitem__ calls don't incur a
> list.remove.
>
> This doesn't appear to have any performance impact on my Firefox
> repository. But that's only because tags reading doesn't show up in
> profiles to begin with. I'm still waiting to hear from a user with over
> 10,000 tags and hundreds of heads on the impact of this patch.
>
> diff --git a/mercurial/tags.py b/mercurial/tags.py
> --- a/mercurial/tags.py
> +++ b/mercurial/tags.py
> @@ -220,11 +220,15 @@ def _readtags(ui, repo, lines, fn, recod
>      All node ids are binary, not hex.
>      '''
>      filetags, nodelines = _readtaghist(ui, repo, lines, fn, recode=recode,
>                                         calcnodelines=calcnodelines)
> +    # util.sortdict().__setitem__ is much slower at replacing then inserting
> +    # new entries. The difference can matter if there are thousands of tags.
> +    # Create a new sortdict to avoid the performance penalty.
> +    newtags = util.sortdict()
>      for tag, taghist in filetags.items():
> -        filetags[tag] = (taghist[-1], taghist[:-1])
> -    return filetags
> +        newtags[tag] = (taghist[-1], taghist[:-1])
> +    return newtags
>
>  def _updatetags(filetags, tagtype, alltags, tagtypes):
>      '''Incorporate the tag info read from one file into the two
>      dictionaries, alltags and tagtypes, that contain all tag
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list