[PATCH 3 of 3] revlog: optionally cache the full text when adding revisions

Augie Fackler raf at durin42.com
Sun Sep 13 21:38:18 CDT 2015


On Sat, Sep 12, 2015 at 04:56:12PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1442099477 25200
> #      Sat Sep 12 16:11:17 2015 -0700
> # Node ID 046c9fd2a75c7b5beca3e4ae04c1ced382530c44
> # Parent  9b0f250a7ac06af053725d735aa551eed9d3e66b
> revlog: optionally cache the full text when adding revisions

queued, very nice wins

>
> revlog instances can cache the full text of a single revision. Typically
> the most recently read revision is cached.
>
> When adding a delta group via addgroup() and _addrevision(), the
> full text isn't always computed: sometimes only the passed in delta is
> sufficient for adding a new revision to the revlog.
>
> When writing the changelog from a delta group, the just-added full
> text revision is always read immediately after it is written because
> the changegroup code needs to extract the set of files from the entry.
> In other words, revision() is *always* being called and caching the full
> text of the just-added revision is guaranteed to result in a cache hit,
> making the cache worthwhile.
>
> This patch adds support to _addrevision() for always building and
> caching the full text. This option is currently only active when
> processing changelog entries from a changegroup.
>
> While the total number of revision() calls is the same, the location
> matters: buildtext() calls into revision() on the base revision when
> building the full text of the just-added revision. Since the previous
> revision's _addrevision() built the full text and the the previous
> revision is likely the base revision, this means that the base
> revision's full text is likely cached and can be used to compute the
> current full text from just a delta. No extra I/O required.
>
> The end result is the changelog isn't opened and read after adding every
> revision from a changegroup.
>
> On my 2013 MacBook Pro running OS X 10.10.5 from an SSD and Python 2.7,
> this patch impacted the time taken to apply ~262,000 changesets from a
> mozilla-central gzip bundle:
>
>   before: ~43s
>   after:  ~32s
>
> ~25% reduction in changelog processing times. Not bad.
>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1254,9 +1254,9 @@ class revlog(object):
>
>          return True
>
>      def _addrevision(self, node, text, transaction, link, p1, p2, flags,
> -                     cachedelta, ifh, dfh):
> +                     cachedelta, ifh, dfh, alwayscache=False):
>          """internal function to add revisions to the log
>
>          see addrevision for argument descriptions.
>          invariants:
> @@ -1390,8 +1390,11 @@ class revlog(object):
>
>          entry = self._io.packentry(e, self.node, self.version, curr)
>          self._writeentry(transaction, ifh, dfh, entry, data, link, offset)
>
> +        if alwayscache and text is None:
> +            text = buildtext()
> +
>          if type(text) == str: # only accept immutable objects
>              self._cache = (node, curr, text)
>          self._basecache = (curr, chainbase)
>          return node
> @@ -1493,17 +1496,18 @@ class revlog(object):
>                  flags = REVIDX_DEFAULT_FLAGS
>                  if self._peek_iscensored(baserev, delta, flush):
>                      flags |= REVIDX_ISCENSORED
>
> +                # We assume consumers of addrevisioncb will want to retrieve
> +                # the added revision, which will require a call to
> +                # revision(). revision() will fast path if there is a cache
> +                # hit. So, we tell _addrevision() to always cache in this case.
>                  chain = self._addrevision(node, None, transaction, link,
>                                            p1, p2, flags, (baserev, delta),
> -                                          ifh, dfh)
> +                                          ifh, dfh,
> +                                          alwayscache=bool(addrevisioncb))
>
>                  if addrevisioncb:
> -                    # Data for added revision can't be read unless flushed
> -                    # because _loadchunk always opensa new file handle and
> -                    # there is no guarantee data was actually written yet.
> -                    flush()
>                      addrevisioncb(self, chain)
>
>                  if not dfh and not self._inline:
>                      # addrevision switched from inline to conventional
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list