[PATCH 3 of 3 v3] exchange: support transferring .hgtags fnodes mapping
pierre-yves.david at ens-lyon.org
Wed Jun 3 01:49:42 CDT 2015
On 06/02/2015 08:26 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1432599251 25200
> # Mon May 25 17:14:11 2015 -0700
> # Node ID d89d3b70acaaeaabdf8cf5574e636331f86f2681
> # Parent 651b11ad04814e4cd3b097b88a0ad0e3f5ab1c3c
> exchange: support transferring .hgtags fnodes mapping
This series is pushed to the clowncopter with a couple of minor changes.
Thanks for pursuing this. We can now discuss tracking content of the
bundle2 from created by getbundle if you want to.
> On Mozilla's mozilla-beta repository .hgtags fnodes resolution takes
> ~18s from a clean cache on my machine. This means that the first time
> a user runs `hg tags`, `hg log`, or any other command that displays or
> accesses tags data, a ~18s pause will occur. There is no output during
> this pause. This results in a poor user experience and perception
> that Mercurial is slow.
> The .hgtags changeset to filenode mapping is deterministic. This
> patch takes advantage of that property by implementing support
> for transferring .hgtags filenodes mappings in a dedicated bundle2
> part. When a client advertising support for the "hgtagsfnodes"
> capability requests a bundle, a mapping of changesets to .hgtags
> filenodes will be sent to the client.
> Only mappings of head changesets included in the server-generated
> bundle will be sent. The transfer of this mapping effectively eliminates
> one time tags cache related pauses after initial clone.
I've dropped the 'server-generated' in this paragraph.
> The mappings are sent as binary data. So, 40 bytes per pair of
> SHA-1s. On the aforementioned mozilla-beta repository,
> 659 * 40 = 26,360 raw bytes of mappings are sent over the wire
> (in addition to the bundle part headers). Assuming 18s to populate
> the cache, we only need to transfer this extra data faster than
> 1.5 KB/s for overall clone + tags cache population time to be shorter.
> Put into perspective, the mozilla-beta repository is ~1 GB in size.
> So, this additional data constitutes <0.01% of the cloned data.
> The marginal overhead for a multi-second performance win on clones
> in my opinion justifies an on-by-default behavior.
> diff --git a/mercurial/exchange.py b/mercurial/exchange.py
> --- a/mercurial/exchange.py
> +++ b/mercurial/exchange.py
> @@ -11,8 +11,9 @@ from node import hex, nullid
> import errno, urllib
> import util, scmutil, changegroup, base85, error, store
> import discovery, phases, obsolete, bookmarks as bookmod, bundle2, pushkey
> import lock as lockmod
> +import tags
> def readbundle(ui, fh, fname, vfs=None):
> header = changegroup.readexactly(fh, 4)
> @@ -1284,8 +1285,49 @@ def _getbundleobsmarkerpart(bundler, rep
> markers = repo.obsstore.relevantmarkers(subset)
> markers = sorted(markers)
> buildobsmarkerspart(bundler, markers)
> + at getbundle2partsgenerator('hgtagsfnodes')
> +def _getbundletagsfnodes(bundler, repo, source, bundlecaps=None,
> + b2caps=None, heads=None, common=None,
> + **kwargs):
> + """Transfer the .hgtags filenodes mapping.
> + Only values for heads in this bundle will be transferred.
> + The part data consists of pairs of 20 byte changeset node and .hgtags
> + filenodes raw values.
> + """
> + # Don't send unless the client supports it.
> + if 'hgtagsfnodes' not in b2caps:
> + return
This has been updated to:
+ # Don't send unless:
+ # - changeset are being exchanged,
+ # - the client supports it.
+ if not (kwargs.get('cg', True) and 'hgtagsfnodes' in b2caps):
Server can receive getbundle request for no changeset content (phases,
> + outgoing = changegroup.computeoutgoing(repo, heads, common)
> + if not outgoing.missingheads:
> + return
> + cache = tags.hgtagsfnodescache(repo.unfiltered())
> + chunks = 
> + # .hgtags fnodes are only relevant for head changesets. While we could
> + # transfer values for all known nodes, there will likely be little to
> + # no benefit.
> + #
> + # We don't bother using a generator to produce output data because
> + # a) we only have 40 bytes per head and even esoteric numbers of heads
> + # consume little memory (1M heads is 40MB) b) we don't want to send the
> + # part if we don't have entries and knowing if we have entries requires
> + # cache lookups.
Thanks for the detailed comment, the next poor soul who get lost in this
function will appreciate it.
More information about the Mercurial-devel