[PATCH 03 of 10 lazy-changelog-parse] changelog: lazily parse description
Gregory Szorc
gregory.szorc at gmail.com
Sun Mar 6 18:58:49 EST 2016
# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1457303326 28800
# Sun Mar 06 14:28:46 2016 -0800
# Node ID d85951413907594c2cb37744ce8b01de2b030930
# Parent 45c41cbfe73e7d685a8831cb73e0064eddc6d33e
changelog: lazily parse description
Before, the description field was converted to a localstr at parse
time. With this patch, we store the raw description and convert to
a localstr when it is first accessed.
We see a revset speedup for revsets that don't access the description:
author(mpm)
0.896565
0.914234
0.869085
date(2015)
0.878797
0.891980
0.862525
extra(rebase_source)
0.865446
0.912514
0.871500
author(mpm) or author(greg)
1.801832
1.860402
1.791589
date(2015) or branch(default)
0.968276
0.994673
0.974027
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.721032
3.643593
As you can see, most of these revsets are already faster than from
before this refactoring: we have already offset the performance
loss from the introduction of the new class representing parsed
changelog entries!
diff --git a/mercurial/changelog.py b/mercurial/changelog.py
--- a/mercurial/changelog.py
+++ b/mercurial/changelog.py
@@ -147,17 +147,17 @@ class changelogrevision(object):
Changelog revisions consist of multiple pieces of data, including
the manifest node, user, and date. This object exposes a view into
the parsed object.
"""
__slots__ = (
'date',
- 'description',
+ '_rawdesc',
'extra',
'files',
'manifest',
'user',
)
def __new__(cls, text):
if not text:
@@ -180,19 +180,20 @@ class changelogrevision(object):
# time tz extra\n : date (time is int or float, timezone is int)
# : extra is metadata, encoded and separated by '\0'
# : older versions ignore it
# files\n\n : files modified by the cset, no \n or \r allowed
# (.*) : comment (free text, ideally utf-8)
#
# changelog v0 doesn't use extra
- last = text.index("\n\n")
- self.description = encoding.tolocal(text[last + 2:])
- l = text[:last].split('\n')
+ doublenl = text.index('\n\n')
+ self._rawdesc = text[doublenl + 2:]
+
+ l = text[:doublenl].split('\n')
self.manifest = bin(l[0])
self.user = encoding.tolocal(l[1])
tdata = l[2].split(' ', 2)
if len(tdata) != 3:
time = float(tdata[0])
try:
# various tools did silly things with the time zone field.
@@ -204,16 +205,20 @@ class changelogrevision(object):
time, timezone = float(tdata[0]), int(tdata[1])
self.extra = decodeextra(tdata[2])
self.date = (time, timezone)
self.files = l[3:]
return self
+ @property
+ def description(self):
+ return encoding.tolocal(self._rawdesc)
+
class changelog(revlog.revlog):
def __init__(self, opener):
revlog.revlog.__init__(self, opener, "00changelog.i")
if self._initempty:
# changelogs don't benefit from generaldelta
self.version &= ~revlog.REVLOGGENERALDELTA
self._generaldelta = False
self._realopener = opener
More information about the Mercurial-devel
mailing list