[PATCH 2 of 4] revlog: add a context manager to allow file handle reuse

Gregory Szorc gregory.szorc at gmail.com
Tue Nov 1 21:16:37 EDT 2016


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1477099818 25200
#      Fri Oct 21 18:30:18 2016 -0700
# Node ID d631065a702fa7eb956258e2289679d5902ccff6
# Parent  fb93d9a0a24db5a93a6a6758eacc6ba5ca37531e
revlog: add a context manager to allow file handle reuse

Currently, read-only operations traversing revlogs must open a new
file handle whenever they need uncached data from the underlying
revlog. This can add overhead to operations such as changegroup
generation.

The revlog APIs have a mechanism for reusing a file descriptor
for I/O. This was added by me a while ago as a means to speed up
revlog.addgroup(). At that time, I didn't do any work around
reusing file handles for read operations.

This patch introduces a context manager to cache an open file handle
on a revlog. When the context manager is active, revlog reads will
be routed to the opened file handle instead of opening a single
use file handle.

There is definitely room to improve the API. We could probably
even refactor the write file descriptor caching to use a context
manager - possibly the same one! However, this is a bit of work
since the write file handle can be swapped out if a revlog
transitions from inline to non-inline in the course of adding
revisions.

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -14,6 +14,7 @@ and O(changes) merge between branches.
 from __future__ import absolute_import
 
 import collections
+import contextlib
 import errno
 import hashlib
 import os
@@ -314,6 +315,8 @@ class revlog(object):
         # revnum -> (chain-length, sum-delta-length)
         self._chaininfocache = {}
 
+        self._readfh = None
+
     def tip(self):
         return self.node(len(self.index) - 2)
     def __contains__(self, rev):
@@ -1018,6 +1021,30 @@ class revlog(object):
         else:
             self._chunkcache = offset, data
 
+    @contextlib.contextmanager
+    def cachefilehandle(self):
+        """Maintain a persistent file handle during operations.
+
+        When this context manager is active, a file descriptor will be reused
+        for all read operations, ensuring the underlying revlog file isn't
+        reopened multiple times.
+        """
+        if self._readfh:
+            raise error.Abort('cachefilehandle already active')
+
+        # Inline revlogs have data chunks cached at open time. If we opened
+        # a file handle it wouldn't be read.
+        if self._inline:
+            yield
+            return
+
+        try:
+            self._readfh = self.opener(self.datafile)
+            yield
+        finally:
+            self._readfh.close()
+            self._readfh = None
+
     def _loadchunk(self, offset, length, df=None):
         """Load a segment of raw data from the revlog.
 
@@ -1029,6 +1056,12 @@ class revlog(object):
 
         Returns a str or buffer of raw byte data.
         """
+        # Use cached file handle from context manager automatically.
+        # self._readfh is always opened in read-only mode. df may be opened
+        # in append mode. The context manager should only be active during
+        # read operations. So this mismatch is OK.
+        df = df or self._readfh
+
         if df is not None:
             closehandle = False
         else:


More information about the Mercurial-devel mailing list