D1394: bundle2: avoid unbound read when seeking

indygreg (Gregory Szorc) phabricator at mercurial-scm.org
Tue Nov 14 01:26:24 EST 2017


indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Currently, seekableunbundlepart.seek() will perform a read() during
  seek operations. This will allocate a buffer to hold the raw data
  over the seek distance. This can lead to very large allocations
  and cause performance to suffer.
  
  We change the code to perform read(32768) in a loop to avoid
  potentially large allocations.
  
  `hg perfbundleread` on an uncompressed Firefox bundle reveals
  a performance impact:
  
  ! bundle2 iterparts()
  ! wall 2.992605 comb 2.990000 user 2.260000 sys 0.730000 (best of 4)
  ! bundle2 iterparts() seekable
  ! wall 3.863810 comb 3.860000 user 3.000000 sys 0.860000 (best of 3)
  ! bundle2 part seek()
  ! wall 6.213387 comb 6.200000 user 3.350000 sys 2.850000 (best of 3)
  ! wall 3.820347 comb 3.810000 user 2.980000 sys 0.830000 (best of 3)
  
  Since seekable bundle parts are (only) used by bundlerepo, this /may/
  speed up initial loading of bundle-based repos. But any improvement
  will likely only be noticed on very large bundles.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1394

AFFECTED FILES
  mercurial/bundle2.py

CHANGE DETAILS

diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
--- a/mercurial/bundle2.py
+++ b/mercurial/bundle2.py
@@ -1415,13 +1415,20 @@
             newpos = self._pos + offset
         elif whence == os.SEEK_END:
             if not self.consumed:
-                self.read()
+                # Can't use self.consume() here because it advances self._pos.
+                chunk = self.read(32768)
+                while chunk:
+                    chunk = self.read(32768)
             newpos = self._chunkindex[-1][0] - offset
         else:
             raise ValueError('Unknown whence value: %r' % (whence,))
 
         if newpos > self._chunkindex[-1][0] and not self.consumed:
-            self.read()
+            # Can't use self.consume() here because it advances self._pos.
+            chunk = self.read(32768)
+            while chunk:
+                chunk = self.read(32668)
+
         if not 0 <= newpos <= self._chunkindex[-1][0]:
             raise ValueError('Offset out of range')
 



To: indygreg, #hg-reviewers
Cc: mercurial-devel


More information about the Mercurial-devel mailing list