D4075: changegroup: capture revision delta in a data structure

indygreg (Gregory Szorc) phabricator at mercurial-scm.org
Fri Aug 3 21:04:48 UTC 2018


indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  The current changegroup generation code is tightly coupled to
  the revlog API. This tight coupling makes it difficult to implement
  alternate storage backends without requiring a large surface area
  of the revlog API to be exposed. This is not desirable.
  
  In order to support changegroup generation with non-revlog storage,
  we'll need to abstract the concept of delta generation.
  
  This commit is the first step down that road. We introduce a
  data structure for representing a delta in a changegroup.
  
  The API still leaves a lot to be desired. But at least we now
  have separation between data and actions performed on it.
  
  As part of this, we tweak behavior slightly: we no longer
  concatenate the delta prefix with the metadata header. Instead,
  we track and emit the prefix as a separate chunk. This shouldn't
  have any meaningful impact since all the chunks just get sent to
  the wire, the compressor, etc.
  
  Because we're introducing a new object, this does add some
  overhead to changegroup execution. `hg perfchangegroupchangelog`
  on my clone of the Mercurial repo (~40,000 visible revisions in
  the changelog) slows down a bit:
  
  ! wall 1.268600 comb 1.270000 user 1.270000 sys 0.000000 (best of 8)
  ! wall 1.419479 comb 1.410000 user 1.410000 sys 0.000000 (best of 8)
  
  With for `hg bundle -t none-v2 -a /dev/null`:
  
  before: real 6.610 secs (user 6.460+0.000 sys 0.140+0.000)
  after:  real 7.210 secs (user 7.060+0.000 sys 0.140+0.000)
  
  I plan to claw back this regression in future commits. And I may
  even do away with this data structure once the refactor is complete.
  For now, it makes things easier to comprehend.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D4075

AFFECTED FILES
  mercurial/changegroup.py

CHANGE DETAILS

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -19,6 +19,10 @@
     short,
 )
 
+from .thirdparty import (
+    attr,
+)
+
 from . import (
     dagutil,
     error,
@@ -497,6 +501,26 @@
             return d
         return readexactly(self._fh, n)
 
+ at attr.s(slots=True, frozen=True)
+class revisiondelta(object):
+    """Describes a delta entry in a changegroup.
+
+    Captured data is sufficient to serialize the delta into multiple
+    formats.
+    """
+    # 20 byte node of this revision.
+    node = attr.ib()
+    # 20 byte nodes of parent revisions.
+    p1node = attr.ib()
+    p2node = attr.ib()
+    # 20 byte node of node this delta is against.
+    basenode = attr.ib()
+    # 20 byte node of changeset revision this delta is associated with.
+    linknode = attr.ib()
+    # 2 bytes of flags to apply to revision data.
+    flags = attr.ib()
+    # Iterable of chunks holding raw delta data.
+    deltachunks = attr.ib()
 
 class cg1packer(object):
     deltaheader = _CHANGEGROUPV1_DELTA_HEADER
@@ -902,13 +926,25 @@
 
     def revchunk(self, store, rev, prev, linknode):
         if util.safehasattr(self, 'full_nodes'):
-            fn = self._revchunknarrow
+            fn = self._revisiondeltanarrow
         else:
-            fn = self._revchunknormal
+            fn = self._revisiondeltanormal
+
+        delta = fn(store, rev, prev, linknode)
+        if not delta:
+            return
 
-        return fn(store, rev, prev, linknode)
+        meta = self.builddeltaheader(delta.node, delta.p1node, delta.p2node,
+                                     delta.basenode, delta.linknode,
+                                     delta.flags)
+        l = len(meta) + sum(len(x) for x in delta.deltachunks)
 
-    def _revchunknormal(self, store, rev, prev, linknode):
+        yield chunkheader(l)
+        yield meta
+        for x in delta.deltachunks:
+            yield x
+
+    def _revisiondeltanormal(self, store, rev, prev, linknode):
         node = store.node(rev)
         p1, p2 = store.parentrevs(rev)
         base = self.deltaparent(store, rev, p1, p2, prev)
@@ -930,16 +966,18 @@
         else:
             delta = store.revdiff(base, rev)
         p1n, p2n = store.parents(node)
-        basenode = store.node(base)
-        flags = store.flags(rev)
-        meta = self.builddeltaheader(node, p1n, p2n, basenode, linknode, flags)
-        meta += prefix
-        l = len(meta) + len(delta)
-        yield chunkheader(l)
-        yield meta
-        yield delta
 
-    def _revchunknarrow(self, store, rev, prev, linknode):
+        return revisiondelta(
+            node=node,
+            p1node=p1n,
+            p2node=p2n,
+            basenode=store.node(base),
+            linknode=linknode,
+            flags=store.flags(rev),
+            deltachunks=(prefix, delta),
+        )
+
+    def _revisiondeltanarrow(self, store, rev, prev, linknode):
         # build up some mapping information that's useful later. See
         # the local() nested function below.
         if not self.changelog_done:
@@ -953,9 +991,7 @@
         # This is a node to send in full, because the changeset it
         # corresponds to was a full changeset.
         if linknode in self.full_nodes:
-            for x in self._revchunknormal(store, rev, prev, linknode):
-                yield x
-            return
+            return self._revisiondeltanormal(store, rev, prev, linknode)
 
         # At this point, a node can either be one we should skip or an
         # ellipsis. If it's not an ellipsis, bail immediately.
@@ -1046,16 +1082,20 @@
         p1n, p2n = store.node(p1), store.node(p2)
         flags = store.flags(rev)
         flags |= revlog.REVIDX_ELLIPSIS
-        meta = self.builddeltaheader(
-            n, p1n, p2n, nullid, linknode, flags)
+
         # TODO: try and actually send deltas for ellipsis data blocks
         data = store.revision(n)
         diffheader = mdiff.trivialdiffheader(len(data))
-        l = len(meta) + len(diffheader) + len(data)
-        yield ''.join((chunkheader(l),
-                       meta,
-                       diffheader,
-                       data))
+
+        return revisiondelta(
+            node=n,
+            p1node=p1n,
+            p2node=p2n,
+            basenode=nullid,
+            linknode=linknode,
+            flags=flags,
+            deltachunks=(diffheader, data),
+        )
 
     def builddeltaheader(self, node, p1n, p2n, basenode, linknode, flags):
         # do nothing with basenode, it is implicitly the previous one in HG10



To: indygreg, #hg-reviewers
Cc: mercurial-devel


More information about the Mercurial-devel mailing list