[PATCH 6 of 6] bundle2: support transferring .hgtags fnodes mapping

Pierre-Yves David pierre-yves.david at ens-lyon.org
Mon Jun 1 00:40:43 CDT 2015



On 05/31/2015 05:28 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1432599251 25200
> #      Mon May 25 17:14:11 2015 -0700
> # Node ID 583b57ce3bf187f0d30db1336288f1671acfc0c2
> # Parent  8f34e8f9636d47c5094844c4e8bbf7353b95d75a
> bundle2: support transferring .hgtags fnodes mapping

small note: I love how you had a lot of clearly split patch from 1-5 why 
is this some so gargantuan?

1) add part handler
2) add optional part generation
3) add on by default part generation

> On Mozilla's mozilla-beta repository .hgtags fnodes resolution takes
> ~18s from a clean cache on my machine. This means that the first time
> a user runs `hg tags`, `hg log`, or any other command that displays or
> accesses tags data, a ~18s pause will occur. There is no output during
> this pause. This results in a poor user experience and perception
> that Mercurial is slow.
>
> The .hgtags changeset to filenode mapping is deterministic. This
> patch takes advantage of that property by implementing support
> for transferring .hgtags filenodes mappings in a dedicated bundle2
> part. When a client performs an `hg clone` or `hg pull` against a
> server advertising support for serving "hgtagsfnodes" data, a mapping
> of changesets to .hgtags filenodes will be sent back to the client.
> Only mappings of changesets included in the server-generated bundle
> will be sent. The transfer of this mapping effectively eliminates
> one time tags cache related pauses after initial clone.
>
> The mappings are sent as binary data. So, 40 bytes per pair of
> SHA-1s. On the aforementioned mozilla-beta repository,
> 659 * 40 = 26,360 raw bytes of mappings are sent over the wire
> (in addition to the bundle part headers). Assuming 18s to populate
> the cache, we only need to transfer this extra data faster than
> 1.5 KB/s for overall clone + tags cache population time to be shorter.
> Put into perspective, the mozilla-beta repository is ~1 GB in size.
> So, this additional data constitutes <0.01% of the cloned data.
> The marginal overhead for a multi-second performance win on clones
> in my opinion justifies an on-by-default behavior. If this turns
> out to be too naive, we can always add an heuristic (read complexity)
> to determine if the data transfer is warranted.
>
> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> --- a/mercurial/bundle2.py
> +++ b/mercurial/bundle2.py
> @@ -155,9 +155,9 @@ import obsolete
>   import pushkey
>   import url
>   import re
>
> -import changegroup, error
> +import changegroup, error, tags
>   from i18n import _
>
>   _pack = struct.pack
>   _unpack = struct.unpack
> @@ -1112,8 +1112,9 @@ capabilities = {'HG20': (),
>                   'listkeys': (),
>                   'pushkey': (),
>                   'digests': tuple(sorted(util.DIGESTS.keys())),
>                   'remote-changegroup': ('http', 'https'),
> +                'tagsfnodes': (),
>                  }
>
>   def getrepocaps(repo, allowpushback=False):
>       """return the bundle2 capabilities for a given repo
> @@ -1362,4 +1363,23 @@ def handlepushkeyreply(op, inpart):
>       """retrieve the result of a pushkey request"""
>       ret = int(inpart.params['new'])
>       partid = int(inpart.params['in-reply-to'])
>       op.records.add('obsmarkers', {'new': ret}, partid)
> +
> + at parthandler('tagsfnodes')
> +def handletagsfnodes(op, inpart):
> +    cache = tags.hgtagsfnodescache(op.repo.unfiltered())
> +
> +    # Raw data is pairs of 20 byte changeset nodes and filenodes.
> +    data = inpart.read()

Why do you read the whole part before processing?

You could read data 20 chunk per 20 chunk:
1) you would start processing earlier is the number of entry is huge,
2) less memory consumption,
3) no manual offset management.

> +    offset = 0
> +    count = 0
> +    while offset < len(data):
> +        node = data[offset:offset + 20]
> +        offset += 20
> +        fnode = data[offset:offset + 20]
> +        offset += 20

This is data coming wrong the outside, you should check that they are 
actually 20 byte wide and ignore or error cleanly.

> +        cache.setfnode(node, fnode)
> +        count += 1
> +
> +    cache.write()
> +    op.repo.ui.debug('%i hgtags fnodes cache entries\n' % count)

Did you forgot a verb in this debug message? If not, I'm not sure how I 
would understand it in the field.


> diff --git a/mercurial/exchange.py b/mercurial/exchange.py
> --- a/mercurial/exchange.py
> +++ b/mercurial/exchange.py
> @@ -11,8 +11,9 @@ from node import hex, nullid
>   import errno, urllib
>   import util, scmutil, changegroup, base85, error, store
>   import discovery, phases, obsolete, bookmarks as bookmod, bundle2, pushkey
>   import lock as lockmod
> +import tags
>
>   def readbundle(ui, fh, fname, vfs=None):
>       header = changegroup.readexactly(fh, 4)
>
> @@ -995,8 +996,11 @@ def _pullbundle2(pullop):
>       kwargs['heads'] = pullop.heads or pullop.rheads
>       kwargs['cg'] = pullop.fetch
>       if 'listkeys' in remotecaps:
>           kwargs['listkeys'] = ['phase', 'bookmarks']
> +    # Retrieve .hgtags fnodes mappings if available.
> +    if 'tagsfnodes' in remotecaps:
> +        kwargs['tagsfnodes'] = True

I not sure we need/want a new kwargs for that. Does it make sense to not 
send them down with any changegroup part? in this case, the remote side 
should build such part it the client support it (the client send his b2 
capabilities to the server for the pull).

>       if not pullop.fetch:
>           pullop.repo.ui.status(_("no changes found\n"))
>           pullop.cgresult = 0
>       else:
> @@ -1280,8 +1284,35 @@ def _getbundleobsmarkerpart(bundler, rep
>           markers = repo.obsstore.relevantmarkers(subset)
>           markers = sorted(markers)
>           buildobsmarkerspart(bundler, markers)
>
> + at getbundle2partsgenerator('tagsfnodes')
> +def _getbundletagsfnodes(bundler, repo, source, bundlecaps=None,
> +                         b2caps=None, heads=None, **kwargs):
> +    """Transfer the .hgtags filenodes mapping.
> +
> +    Only values for changesets part of this bundle will be transferred.
> +
> +    The part data consists of pairs of 20 byte changeset node and .hgtags
> +    filenodes raw values.
> +    """
> +    if not kwargs.get('tagsfnodes', False) or not bundler.outgoing:
> +        return

cf feedback about "I'm not sure a new argument is relevant".

> +
> +    cache = tags.hgtagsfnodescache(repo.unfiltered())
> +    chunks = []
> +    # .hgtags fnodes are only relevant for head changesets. While we could
> +    # transfer values for all known nodes, there will likely be little to
> +    # no benefit.
> +    for node in bundler.outgoing.missingheads:

The "missingheads" data is pretty trivially computed from the request 
argument.

    if heads is None:
        heads = cl.heads()
    # and that is it.

So I believe you do not have to do any attempt to carry the outgoing 
object around. This should simplify the whole thing.

> +        # Don't compute missing, as this may slow down serving.
> +        fnode = cache.getfnode(node, computemissing=False)
> +        if fnode is not None:
> +            chunks.extend([node, fnode])
> +
> +    if chunks:
> +        bundler.newpart('tagsfnodes', data=''.join(chunks))
> +
>   def check_heads(repo, their_heads, context):
>       """check if the heads of a repo have been modified
>
>       Used by peer for unbundling.
> diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
> --- a/mercurial/wireproto.py
> +++ b/mercurial/wireproto.py
> @@ -208,9 +208,11 @@ gboptsmap = {'heads':  'nodes',
>                'common': 'nodes',
>                'obsmarkers': 'boolean',
>                'bundlecaps': 'csv',
>                'listkeys': 'csv',
> -             'cg': 'boolean'}
> +             'cg': 'boolean',
> +             'tagsfnodes': 'boolean',

cf feedback about "I'm not sure a new argument is relevant".

> +             }
>
>   # client side
>
>   class wirepeer(peer.peerrepository):
> diff --git a/tests/test-bundle2-exchange.t b/tests/test-bundle2-exchange.t
> --- a/tests/test-bundle2-exchange.t
> +++ b/tests/test-bundle2-exchange.t
> @@ -408,8 +408,93 @@ Check final content.
>     other/.hg/store/phaseroots
>     $ ls -1 other/.hg/store/00changelog.i*
>     other/.hg/store/00changelog.i
>
> +Create a repository with tags data to test .hgtags fnodes transfer
> +
> +  $ hg init tagsserver
> +  $ cd tagsserver
> +  $ touch foo
> +  $ hg -q commit -A -m initial
> +  pre-close-tip:96ee1d7354c4 draft
> +  postclose-tip:96ee1d7354c4 draft
> +  txnclose hook: HG_PHASES_MOVED=1 HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
> +  $ hg tag -m 'tag 0.1' 0.1
> +  pre-close-tip:c4dab0c2fd33 draft
> +  postclose-tip:c4dab0c2fd33 draft
> +  txnclose hook: HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
> +  $ echo second > foo
> +  $ hg commit -m second
> +  pre-close-tip:f63cc8fe54e4 draft
> +  postclose-tip:f63cc8fe54e4 draft
> +  txnclose hook: HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
> +  $ hg tag -m 'tag 0.2' 0.2
> +  pre-close-tip:40f0358cb314 draft
> +  postclose-tip:40f0358cb314 draft
> +  txnclose hook: HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
> +  $ hg tags
> +  tip                                3:40f0358cb314
> +  0.2                                2:f63cc8fe54e4
> +  0.1                                0:96ee1d7354c4
> +  $ cd ..
> +
> +Cloning should pull down hgtags fnodes mappings and write the cache file
> +
> +  $ hg clone --pull tagsserver tagsclient
> +  requesting all changes
> +  adding changesets
> +  adding manifests
> +  adding file changes
> +  added 4 changesets with 4 changes to 2 files
> +  pre-close-tip:40f0358cb314 draft
> +  postclose-tip:40f0358cb314 draft
> +  txnclose hook: HG_NODE=96ee1d7354c4ad7372047672c36a1f561e3a6a4c HG_PHASES_MOVED=1 HG_SOURCE=pull HG_TXNID=TXN:* HG_TXNNAME=pull (glob)
> +  file:/*/$TESTTMP/tagsserver HG_URL=file:$TESTTMP/tagsserver (glob)
> +  updating to branch default
> +  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +
> +Missing tags2* files means the cache wasn't written through the normal mechanism.
> +
> +  $ ls tagsclient/.hg/cache
> +  branch2-served
> +  hgtagsfnodes1
> +  rbc-names-v1
> +  rbc-revs-v1
> +
> +Cache should contain the head only, even though other nodes have tags data
> +
> +  $ f --size --hexdump tagsclient/.hg/cache/hgtagsfnodes1
> +  tagsclient/.hg/cache/hgtagsfnodes1: size=96
> +  0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
> +  0010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
> +  0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
> +  0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
> +  0040: ff ff ff ff ff ff ff ff 40 f0 35 8c 19 e0 a7 d3 |........ at .5.....|
> +  0050: 8a 5c 6a 82 4d cf fb a5 87 d0 2f a3 1e 4f 2f 8a |.\j.M...../..O/.|
> +
> +Running hg tags should produce tags2* file and not change cache
> +
> +  $ hg -R tagsclient tags
> +  tip                                3:40f0358cb314
> +  0.2                                2:f63cc8fe54e4
> +  0.1                                0:96ee1d7354c4
> +
> +  $ ls tagsclient/.hg/cache
> +  branch2-served
> +  hgtagsfnodes1
> +  rbc-names-v1
> +  rbc-revs-v1
> +  tags2-visible
> +
> +  $ f --size --hexdump tagsclient/.hg/cache/hgtagsfnodes1
> +  tagsclient/.hg/cache/hgtagsfnodes1: size=96
> +  0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
> +  0010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
> +  0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
> +  0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
> +  0040: ff ff ff ff ff ff ff ff 40 f0 35 8c 19 e0 a7 d3 |........ at .5.....|
> +  0050: 8a 5c 6a 82 4d cf fb a5 87 d0 2f a3 1e 4f 2f 8a |.\j.M...../..O/.|
> +
>   Error Handling
>   ==============
>
>   Check that errors are properly returned to the client during push.
> diff --git a/tests/test-hgweb-commands.t b/tests/test-hgweb-commands.t
> --- a/tests/test-hgweb-commands.t
> +++ b/tests/test-hgweb-commands.t
> @@ -1885,9 +1885,9 @@ capabilities
>
>     $ "$TESTDIR/get-with-headers.py" 127.0.0.1:$HGPORT '?cmd=capabilities'; echo
>     200 Script output follows
>
> -  lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1*%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 (glob)
> +  lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps%0Atagsfnodes unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>
>   heads
>
>     $ "$TESTDIR/get-with-headers.py" 127.0.0.1:$HGPORT '?cmd=heads'
> @@ -2065,9 +2065,9 @@ capabilities
>
>     $ "$TESTDIR/get-with-headers.py" 127.0.0.1:$HGPORT '?cmd=capabilities'; echo
>     200 Script output follows
>
> -  lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream-preferred stream bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1*%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 (glob)
> +  lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream-preferred stream bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps%0Atagsfnodes unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>
>   heads
>
>   ERRORS ENCOUNTERED
> diff --git a/tests/test-ssh.t b/tests/test-ssh.t
> --- a/tests/test-ssh.t
> +++ b/tests/test-ssh.t
> @@ -451,10 +451,10 @@ debug output
>     pulling from ssh://user@dummy/remote
>     running python "*/dummyssh" user at dummy 'hg -R remote serve --stdio' (glob)
>     sending hello command
>     sending between command
> -  remote: 271
> -  remote: capabilities: lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
> +  remote: 284
> +  remote: capabilities: lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps%0Atagsfnodes unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>     remote: 1
>     preparing listkeys for "bookmarks"
>     sending listkeys command
>     received listkey for "bookmarks": 45 bytes
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
>

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list