[PATCH 3 of 3 v2] exchange: support transferring .hgtags fnodes mapping
pierre-yves.david at ens-lyon.org
Tue Jun 2 03:41:39 CDT 2015
On 06/01/2015 08:56 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1432599251 25200
> # Mon May 25 17:14:11 2015 -0700
> # Node ID 3735ea7dde4033078a135f0d500f7595d68aeaf3
> # Parent 3ed19563760abf1babe65f37e6b4ec6c4422ac2b
> exchange: support transferring .hgtags fnodes mapping
> On Mozilla's mozilla-beta repository .hgtags fnodes resolution takes
> ~18s from a clean cache on my machine. This means that the first time
> a user runs `hg tags`, `hg log`, or any other command that displays or
> accesses tags data, a ~18s pause will occur. There is no output during
> this pause. This results in a poor user experience and perception
> that Mercurial is slow.
> The .hgtags changeset to filenode mapping is deterministic. This
> patch takes advantage of that property by implementing support
> for transferring .hgtags filenodes mappings in a dedicated bundle2
> part. When a client advertising support for the "hgtagsfnodes"
> capability requests a bundle, a mapping of changesets to .hgtags
> filenodes will be sent to the client.
> Only mappings of head changesets included in the server-generated
> bundle will be sent. The transfer of this mapping effectively eliminates
> one time tags cache related pauses after initial clone.
We have no way to use such sophistication right now, we should delay.
> The mappings are sent as binary data. So, 40 bytes per pair of
> SHA-1s. On the aforementioned mozilla-beta repository,
> 659 * 40 = 26,360 raw bytes of mappings are sent over the wire
> (in addition to the bundle part headers). Assuming 18s to populate
> the cache, we only need to transfer this extra data faster than
> 1.5 KB/s for overall clone + tags cache population time to be shorter.
> Put into perspective, the mozilla-beta repository is ~1 GB in size.
> So, this additional data constitutes <0.01% of the cloned data.
> The marginal overhead for a multi-second performance win on clones
> in my opinion justifies an on-by-default behavior. If this turns
> out to be too naive, we can always add an heuristic to determine if
> the data transfer is warranted.
> diff --git a/mercurial/exchange.py b/mercurial/exchange.py
> --- a/mercurial/exchange.py
> +++ b/mercurial/exchange.py
> @@ -11,8 +11,9 @@ from node import hex, nullid
> import errno, urllib
> import util, scmutil, changegroup, base85, error, store
> import discovery, phases, obsolete, bookmarks as bookmod, bundle2, pushkey
> import lock as lockmod
> +import tags
> def readbundle(ui, fh, fname, vfs=None):
> header = changegroup.readexactly(fh, 4)
> @@ -1289,8 +1290,36 @@ def _getbundleobsmarkerpart(bundler, rep
> markers = repo.obsstore.relevantmarkers(subset)
> markers = sorted(markers)
> buildobsmarkerspart(bundler, markers)
> + at getbundle2partsgenerator('hgtagsfnodes')
> +def _getbundletagsfnodes(bundler, repo, source, bundlecaps=None,
> + b2caps=None, heads=None, **kwargs):
> + """Transfer the .hgtags filenodes mapping.
> + Only values for heads in this bundle will be transferred.
> + The part data consists of pairs of 20 byte changeset node and .hgtags
> + filenodes raw values.
> + """
> + # Don't send unless the client supports it.
> + if 'hgtagsfnodes' not in b2caps:
> + return
> + cache = tags.hgtagsfnodescache(repo.unfiltered())
> + chunks = 
> + # .hgtags fnodes are only relevant for head changesets. While we could
> + # transfer values for all known nodes, there will likely be little to
> + # no benefit.
You should make '_computeoutgoing' accessible and use it to compute the
outgoing set. This will make the first implementation simpler and let us
move forward on this feature.
> + for node in bundler.bundledheads:
> + # Don't compute missing, as this may slow down serving.
> + fnode = cache.getfnode(node, computemissing=False)
> + if fnode is not None:
> + chunks.extend([node, fnode])
> + if chunks:
> + bundler.newpart('hgtagsfnodes', data=''.join(chunks))
It looks like we could use a generator for this part. (just pass an
iterator yielding chunk as 'data') This would allow streaming the part
content if it gets big. Do not worry, there is internal buffer to keep
the protocol level chunk to a sensible size.
This is a hint for improvement that could go in its own future patch.
> def check_heads(repo, their_heads, context):
> """check if the heads of a repo have been modified
> Used by peer for unbundling.
> diff --git a/tests/test-bundle2-exchange.t b/tests/test-bundle2-exchange.t
> --- a/tests/test-bundle2-exchange.t
> +++ b/tests/test-bundle2-exchange.t
> @@ -408,8 +408,93 @@ Check final content.
> $ ls -1 other/.hg/store/00changelog.i*
> +Create a repository with tags data to test .hgtags fnodes transfer
This should probably belong to the .hgtags cache test instead of the
generic bundle2 one.
More information about the Mercurial-devel