[PATCH STABLE?] tags: create new sortdict for performance reasons
Augie Fackler
raf at durin42.com
Sat Nov 21 16:06:33 CST 2015
On Thu, Nov 12, 2015 at 01:17:50PM -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1447362964 28800
> # Thu Nov 12 13:16:04 2015 -0800
> # Node ID c541bc4fc379f76d0cf30354942f78aaddd26f04
> # Parent 150fb782ef80919e1c8cc7d4b047d78f889c7846
> tags: create new sortdict for performance reasons
squinting at this lightly, I suspect sortdict could go away in favor
of collections.ordereddict now...
>
> sortdict internally maintains a list of keys in insertion order. When a
> key is replaced via __setitem__, we .remove() from this list. This
> involves a linear scan and array adjustment. This is an expensive
> operation.
>
> The tags reading code was calling into sortdict.__setitem__ for each tag
> in a read .hgtags revision. For repositories with thousands of tags or
> thousands of .hgtags revisions, the overhead from list.remove()
> noticeable.
>
> This patch creates a new sortdict() so __setitem__ calls don't incur a
> list.remove.
>
> This doesn't appear to have any performance impact on my Firefox
> repository. But that's only because tags reading doesn't show up in
> profiles to begin with. I'm still waiting to hear from a user with over
> 10,000 tags and hundreds of heads on the impact of this patch.
>
> diff --git a/mercurial/tags.py b/mercurial/tags.py
> --- a/mercurial/tags.py
> +++ b/mercurial/tags.py
> @@ -220,11 +220,15 @@ def _readtags(ui, repo, lines, fn, recod
> All node ids are binary, not hex.
> '''
> filetags, nodelines = _readtaghist(ui, repo, lines, fn, recode=recode,
> calcnodelines=calcnodelines)
> + # util.sortdict().__setitem__ is much slower at replacing then inserting
> + # new entries. The difference can matter if there are thousands of tags.
> + # Create a new sortdict to avoid the performance penalty.
> + newtags = util.sortdict()
> for tag, taghist in filetags.items():
> - filetags[tag] = (taghist[-1], taghist[:-1])
> - return filetags
> + newtags[tag] = (taghist[-1], taghist[:-1])
> + return newtags
>
> def _updatetags(filetags, tagtype, alltags, tagtypes):
> '''Incorporate the tag info read from one file into the two
> dictionaries, alltags and tagtypes, that contain all tag
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list