[PATCH 3 of 3 v3] exchange: support transferring .hgtags fnodes mapping

Wed Jun 3 01:49:42 CDT 2015

On 06/02/2015 08:26 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1432599251 25200
> #      Mon May 25 17:14:11 2015 -0700
> # Node ID d89d3b70acaaeaabdf8cf5574e636331f86f2681
> # Parent  651b11ad04814e4cd3b097b88a0ad0e3f5ab1c3c
> exchange: support transferring .hgtags fnodes mapping

This series is pushed to the clowncopter with a couple of minor changes.
Thanks for pursuing this. We can now discuss tracking content of the 
bundle2 from created by getbundle if you want to.

> On Mozilla's mozilla-beta repository .hgtags fnodes resolution takes
> ~18s from a clean cache on my machine. This means that the first time
> a user runs `hg tags`, `hg log`, or any other command that displays or
> accesses tags data, a ~18s pause will occur. There is no output during
> this pause. This results in a poor user experience and perception
> that Mercurial is slow.
>
> The .hgtags changeset to filenode mapping is deterministic. This
> patch takes advantage of that property by implementing support
> for transferring .hgtags filenodes mappings in a dedicated bundle2
> part. When a client advertising support for the "hgtagsfnodes"
> capability requests a bundle, a mapping of changesets to .hgtags
> filenodes will be sent to the client.
>
> Only mappings of head changesets included in the server-generated
> bundle will be sent. The transfer of this mapping effectively eliminates
> one time tags cache related pauses after initial clone.

I've dropped the 'server-generated' in this paragraph.

> The mappings are sent as binary data. So, 40 bytes per pair of
> SHA-1s. On the aforementioned mozilla-beta repository,
> 659 * 40 = 26,360 raw bytes of mappings are sent over the wire
> (in addition to the bundle part headers). Assuming 18s to populate
> the cache, we only need to transfer this extra data faster than
> 1.5 KB/s for overall clone + tags cache population time to be shorter.
> Put into perspective, the mozilla-beta repository is ~1 GB in size.
> So, this additional data constitutes <0.01% of the cloned data.
> The marginal overhead for a multi-second performance win on clones
> in my opinion justifies an on-by-default behavior.
>
> diff --git a/mercurial/exchange.py b/mercurial/exchange.py
> --- a/mercurial/exchange.py
> +++ b/mercurial/exchange.py
> @@ -11,8 +11,9 @@ from node import hex, nullid
>   import errno, urllib
>   import util, scmutil, changegroup, base85, error, store
>   import discovery, phases, obsolete, bookmarks as bookmod, bundle2, pushkey
>   import lock as lockmod
> +import tags
>
>   def readbundle(ui, fh, fname, vfs=None):
>       header = changegroup.readexactly(fh, 4)
>
> @@ -1284,8 +1285,49 @@ def _getbundleobsmarkerpart(bundler, rep
>           markers = repo.obsstore.relevantmarkers(subset)
>           markers = sorted(markers)
>           buildobsmarkerspart(bundler, markers)
>
> + at getbundle2partsgenerator('hgtagsfnodes')
> +def _getbundletagsfnodes(bundler, repo, source, bundlecaps=None,
> +                         b2caps=None, heads=None, common=None,
> +                         **kwargs):
> +    """Transfer the .hgtags filenodes mapping.
> +
> +    Only values for heads in this bundle will be transferred.
> +
> +    The part data consists of pairs of 20 byte changeset node and .hgtags
> +    filenodes raw values.
> +    """
> +    # Don't send unless the client supports it.
> +    if 'hgtagsfnodes' not in b2caps:
> +        return

This has been updated to:

+    # Don't send unless:
+    # - changeset are being exchanged,
+    # - the client supports it.
+    if not (kwargs.get('cg', True) and 'hgtagsfnodes' in b2caps):
+        return

Server can receive getbundle request for no changeset content (phases, 
bookmarks, obsmarkers)

> +    outgoing = changegroup.computeoutgoing(repo, heads, common)
> +
> +    if not outgoing.missingheads:
> +        return
> +
> +    cache = tags.hgtagsfnodescache(repo.unfiltered())
> +    chunks = []
> +
> +    # .hgtags fnodes are only relevant for head changesets. While we could
> +    # transfer values for all known nodes, there will likely be little to
> +    # no benefit.
> +    #
> +    # We don't bother using a generator to produce output data because
> +    # a) we only have 40 bytes per head and even esoteric numbers of heads
> +    # consume little memory (1M heads is 40MB) b) we don't want to send the
> +    # part if we don't have entries and knowing if we have entries requires
> +    # cache lookups.

Thanks for the detailed comment, the next poor soul who get lost in this 
function will appreciate it.

-- 
Pierre-Yves David