[PATCH 2 of 4] revlog: add a context manager to allow file handle reuse
Gregory Szorc
gregory.szorc at gmail.com
Tue Nov 1 21:16:37 EDT 2016
# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1477099818 25200
# Fri Oct 21 18:30:18 2016 -0700
# Node ID d631065a702fa7eb956258e2289679d5902ccff6
# Parent fb93d9a0a24db5a93a6a6758eacc6ba5ca37531e
revlog: add a context manager to allow file handle reuse
Currently, read-only operations traversing revlogs must open a new
file handle whenever they need uncached data from the underlying
revlog. This can add overhead to operations such as changegroup
generation.
The revlog APIs have a mechanism for reusing a file descriptor
for I/O. This was added by me a while ago as a means to speed up
revlog.addgroup(). At that time, I didn't do any work around
reusing file handles for read operations.
This patch introduces a context manager to cache an open file handle
on a revlog. When the context manager is active, revlog reads will
be routed to the opened file handle instead of opening a single
use file handle.
There is definitely room to improve the API. We could probably
even refactor the write file descriptor caching to use a context
manager - possibly the same one! However, this is a bit of work
since the write file handle can be swapped out if a revlog
transitions from inline to non-inline in the course of adding
revisions.
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -14,6 +14,7 @@ and O(changes) merge between branches.
from __future__ import absolute_import
import collections
+import contextlib
import errno
import hashlib
import os
@@ -314,6 +315,8 @@ class revlog(object):
# revnum -> (chain-length, sum-delta-length)
self._chaininfocache = {}
+ self._readfh = None
+
def tip(self):
return self.node(len(self.index) - 2)
def __contains__(self, rev):
@@ -1018,6 +1021,30 @@ class revlog(object):
else:
self._chunkcache = offset, data
+ @contextlib.contextmanager
+ def cachefilehandle(self):
+ """Maintain a persistent file handle during operations.
+
+ When this context manager is active, a file descriptor will be reused
+ for all read operations, ensuring the underlying revlog file isn't
+ reopened multiple times.
+ """
+ if self._readfh:
+ raise error.Abort('cachefilehandle already active')
+
+ # Inline revlogs have data chunks cached at open time. If we opened
+ # a file handle it wouldn't be read.
+ if self._inline:
+ yield
+ return
+
+ try:
+ self._readfh = self.opener(self.datafile)
+ yield
+ finally:
+ self._readfh.close()
+ self._readfh = None
+
def _loadchunk(self, offset, length, df=None):
"""Load a segment of raw data from the revlog.
@@ -1029,6 +1056,12 @@ class revlog(object):
Returns a str or buffer of raw byte data.
"""
+ # Use cached file handle from context manager automatically.
+ # self._readfh is always opened in read-only mode. df may be opened
+ # in append mode. The context manager should only be active during
+ # read operations. So this mismatch is OK.
+ df = df or self._readfh
+
if df is not None:
closehandle = False
else:
More information about the Mercurial-devel
mailing list