[PATCH 3 of 3] revlog: optionally cache the full text when adding revisions
raf at durin42.com
Sun Sep 13 21:38:18 CDT 2015
On Sat, Sep 12, 2015 at 04:56:12PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1442099477 25200
> # Sat Sep 12 16:11:17 2015 -0700
> # Node ID 046c9fd2a75c7b5beca3e4ae04c1ced382530c44
> # Parent 9b0f250a7ac06af053725d735aa551eed9d3e66b
> revlog: optionally cache the full text when adding revisions
queued, very nice wins
> revlog instances can cache the full text of a single revision. Typically
> the most recently read revision is cached.
> When adding a delta group via addgroup() and _addrevision(), the
> full text isn't always computed: sometimes only the passed in delta is
> sufficient for adding a new revision to the revlog.
> When writing the changelog from a delta group, the just-added full
> text revision is always read immediately after it is written because
> the changegroup code needs to extract the set of files from the entry.
> In other words, revision() is *always* being called and caching the full
> text of the just-added revision is guaranteed to result in a cache hit,
> making the cache worthwhile.
> This patch adds support to _addrevision() for always building and
> caching the full text. This option is currently only active when
> processing changelog entries from a changegroup.
> While the total number of revision() calls is the same, the location
> matters: buildtext() calls into revision() on the base revision when
> building the full text of the just-added revision. Since the previous
> revision's _addrevision() built the full text and the the previous
> revision is likely the base revision, this means that the base
> revision's full text is likely cached and can be used to compute the
> current full text from just a delta. No extra I/O required.
> The end result is the changelog isn't opened and read after adding every
> revision from a changegroup.
> On my 2013 MacBook Pro running OS X 10.10.5 from an SSD and Python 2.7,
> this patch impacted the time taken to apply ~262,000 changesets from a
> mozilla-central gzip bundle:
> before: ~43s
> after: ~32s
> ~25% reduction in changelog processing times. Not bad.
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1254,9 +1254,9 @@ class revlog(object):
> return True
> def _addrevision(self, node, text, transaction, link, p1, p2, flags,
> - cachedelta, ifh, dfh):
> + cachedelta, ifh, dfh, alwayscache=False):
> """internal function to add revisions to the log
> see addrevision for argument descriptions.
> @@ -1390,8 +1390,11 @@ class revlog(object):
> entry = self._io.packentry(e, self.node, self.version, curr)
> self._writeentry(transaction, ifh, dfh, entry, data, link, offset)
> + if alwayscache and text is None:
> + text = buildtext()
> if type(text) == str: # only accept immutable objects
> self._cache = (node, curr, text)
> self._basecache = (curr, chainbase)
> return node
> @@ -1493,17 +1496,18 @@ class revlog(object):
> flags = REVIDX_DEFAULT_FLAGS
> if self._peek_iscensored(baserev, delta, flush):
> flags |= REVIDX_ISCENSORED
> + # We assume consumers of addrevisioncb will want to retrieve
> + # the added revision, which will require a call to
> + # revision(). revision() will fast path if there is a cache
> + # hit. So, we tell _addrevision() to always cache in this case.
> chain = self._addrevision(node, None, transaction, link,
> p1, p2, flags, (baserev, delta),
> - ifh, dfh)
> + ifh, dfh,
> + alwayscache=bool(addrevisioncb))
> if addrevisioncb:
> - # Data for added revision can't be read unless flushed
> - # because _loadchunk always opensa new file handle and
> - # there is no guarantee data was actually written yet.
> - flush()
> addrevisioncb(self, chain)
> if not dfh and not self._inline:
> # addrevision switched from inline to conventional
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
More information about the Mercurial-devel