[PATCH 6 of 6] bundle2: support transferring .hgtags fnodes mapping
Gregory Szorc
gregory.szorc at gmail.com
Mon Jun 1 22:02:41 CDT 2015
On Sun, May 31, 2015 at 10:40 PM, Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:
>
>
> On 05/31/2015 05:28 PM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1432599251 25200
>> # Mon May 25 17:14:11 2015 -0700
>> # Node ID 583b57ce3bf187f0d30db1336288f1671acfc0c2
>> # Parent 8f34e8f9636d47c5094844c4e8bbf7353b95d75a
>> bundle2: support transferring .hgtags fnodes mapping
>>
>
> small note: I love how you had a lot of clearly split patch from 1-5 why
> is this some so gargantuan?
>
> 1) add part handler
> 2) add optional part generation
> 3) add on by default part generation
Sorry. I had to implement this all at once in order to figure out the
bundle2 API and I didn't think to split up the patch before patchbombing.
I'll do that next time.
>
>
> On Mozilla's mozilla-beta repository .hgtags fnodes resolution takes
>> ~18s from a clean cache on my machine. This means that the first time
>> a user runs `hg tags`, `hg log`, or any other command that displays or
>> accesses tags data, a ~18s pause will occur. There is no output during
>> this pause. This results in a poor user experience and perception
>> that Mercurial is slow.
>>
>> The .hgtags changeset to filenode mapping is deterministic. This
>> patch takes advantage of that property by implementing support
>> for transferring .hgtags filenodes mappings in a dedicated bundle2
>> part. When a client performs an `hg clone` or `hg pull` against a
>> server advertising support for serving "hgtagsfnodes" data, a mapping
>> of changesets to .hgtags filenodes will be sent back to the client.
>> Only mappings of changesets included in the server-generated bundle
>> will be sent. The transfer of this mapping effectively eliminates
>> one time tags cache related pauses after initial clone.
>>
>> The mappings are sent as binary data. So, 40 bytes per pair of
>> SHA-1s. On the aforementioned mozilla-beta repository,
>> 659 * 40 = 26,360 raw bytes of mappings are sent over the wire
>> (in addition to the bundle part headers). Assuming 18s to populate
>> the cache, we only need to transfer this extra data faster than
>> 1.5 KB/s for overall clone + tags cache population time to be shorter.
>> Put into perspective, the mozilla-beta repository is ~1 GB in size.
>> So, this additional data constitutes <0.01% of the cloned data.
>> The marginal overhead for a multi-second performance win on clones
>> in my opinion justifies an on-by-default behavior. If this turns
>> out to be too naive, we can always add an heuristic (read complexity)
>> to determine if the data transfer is warranted.
>>
>> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
>> --- a/mercurial/bundle2.py
>> +++ b/mercurial/bundle2.py
>> @@ -155,9 +155,9 @@ import obsolete
>> import pushkey
>> import url
>> import re
>>
>> -import changegroup, error
>> +import changegroup, error, tags
>> from i18n import _
>>
>> _pack = struct.pack
>> _unpack = struct.unpack
>> @@ -1112,8 +1112,9 @@ capabilities = {'HG20': (),
>> 'listkeys': (),
>> 'pushkey': (),
>> 'digests': tuple(sorted(util.DIGESTS.keys())),
>> 'remote-changegroup': ('http', 'https'),
>> + 'tagsfnodes': (),
>> }
>>
>> def getrepocaps(repo, allowpushback=False):
>> """return the bundle2 capabilities for a given repo
>> @@ -1362,4 +1363,23 @@ def handlepushkeyreply(op, inpart):
>> """retrieve the result of a pushkey request"""
>> ret = int(inpart.params['new'])
>> partid = int(inpart.params['in-reply-to'])
>> op.records.add('obsmarkers', {'new': ret}, partid)
>> +
>> + at parthandler('tagsfnodes')
>> +def handletagsfnodes(op, inpart):
>> + cache = tags.hgtagsfnodescache(op.repo.unfiltered())
>> +
>> + # Raw data is pairs of 20 byte changeset nodes and filenodes.
>> + data = inpart.read()
>>
>
> Why do you read the whole part before processing?
>
Old habit trading memory for fewer I/O related system calls. I see the
unbundler is using some kind of in-memory buffer. So I'll happily switch
this to reading 20 bytes at a time since apparently that won't translate to
system calls.
>
> You could read data 20 chunk per 20 chunk:
> 1) you would start processing earlier is the number of entry is huge,
> 2) less memory consumption,
> 3) no manual offset management.
>
> + offset = 0
>> + count = 0
>> + while offset < len(data):
>> + node = data[offset:offset + 20]
>> + offset += 20
>> + fnode = data[offset:offset + 20]
>> + offset += 20
>>
>
> This is data coming wrong the outside, you should check that they are
> actually 20 byte wide and ignore or error cleanly.
>
> + cache.setfnode(node, fnode)
>> + count += 1
>> +
>> + cache.write()
>> + op.repo.ui.debug('%i hgtags fnodes cache entries\n' % count)
>>
>
> Did you forgot a verb in this debug message? If not, I'm not sure how I
> would understand it in the field.
>
>
> diff --git a/mercurial/exchange.py b/mercurial/exchange.py
>> --- a/mercurial/exchange.py
>> +++ b/mercurial/exchange.py
>> @@ -11,8 +11,9 @@ from node import hex, nullid
>> import errno, urllib
>> import util, scmutil, changegroup, base85, error, store
>> import discovery, phases, obsolete, bookmarks as bookmod, bundle2,
>> pushkey
>> import lock as lockmod
>> +import tags
>>
>> def readbundle(ui, fh, fname, vfs=None):
>> header = changegroup.readexactly(fh, 4)
>>
>> @@ -995,8 +996,11 @@ def _pullbundle2(pullop):
>> kwargs['heads'] = pullop.heads or pullop.rheads
>> kwargs['cg'] = pullop.fetch
>> if 'listkeys' in remotecaps:
>> kwargs['listkeys'] = ['phase', 'bookmarks']
>> + # Retrieve .hgtags fnodes mappings if available.
>> + if 'tagsfnodes' in remotecaps:
>> + kwargs['tagsfnodes'] = True
>>
>
> I not sure we need/want a new kwargs for that. Does it make sense to not
> send them down with any changegroup part? in this case, the remote side
> should build such part it the client support it (the client send his b2
> capabilities to the server for the pull).
>
This was me not groking bundle2. I'll revisit this.
>
> if not pullop.fetch:
>> pullop.repo.ui.status(_("no changes found\n"))
>> pullop.cgresult = 0
>> else:
>> @@ -1280,8 +1284,35 @@ def _getbundleobsmarkerpart(bundler, rep
>> markers = repo.obsstore.relevantmarkers(subset)
>> markers = sorted(markers)
>> buildobsmarkerspart(bundler, markers)
>>
>> + at getbundle2partsgenerator('tagsfnodes')
>> +def _getbundletagsfnodes(bundler, repo, source, bundlecaps=None,
>> + b2caps=None, heads=None, **kwargs):
>> + """Transfer the .hgtags filenodes mapping.
>> +
>> + Only values for changesets part of this bundle will be transferred.
>> +
>> + The part data consists of pairs of 20 byte changeset node and .hgtags
>> + filenodes raw values.
>> + """
>> + if not kwargs.get('tagsfnodes', False) or not bundler.outgoing:
>> + return
>>
>
> cf feedback about "I'm not sure a new argument is relevant".
>
> +
>> + cache = tags.hgtagsfnodescache(repo.unfiltered())
>> + chunks = []
>> + # .hgtags fnodes are only relevant for head changesets. While we
>> could
>> + # transfer values for all known nodes, there will likely be little to
>> + # no benefit.
>> + for node in bundler.outgoing.missingheads:
>>
>
> The "missingheads" data is pretty trivially computed from the request
> argument.
>
> if heads is None:
> heads = cl.heads()
> # and that is it.
>
> So I believe you do not have to do any attempt to carry the outgoing
> object around. This should simplify the whole thing.
>
> + # Don't compute missing, as this may slow down serving.
>> + fnode = cache.getfnode(node, computemissing=False)
>> + if fnode is not None:
>> + chunks.extend([node, fnode])
>> +
>> + if chunks:
>> + bundler.newpart('tagsfnodes', data=''.join(chunks))
>> +
>> def check_heads(repo, their_heads, context):
>> """check if the heads of a repo have been modified
>>
>> Used by peer for unbundling.
>> diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>> --- a/mercurial/wireproto.py
>> +++ b/mercurial/wireproto.py
>> @@ -208,9 +208,11 @@ gboptsmap = {'heads': 'nodes',
>> 'common': 'nodes',
>> 'obsmarkers': 'boolean',
>> 'bundlecaps': 'csv',
>> 'listkeys': 'csv',
>> - 'cg': 'boolean'}
>> + 'cg': 'boolean',
>> + 'tagsfnodes': 'boolean',
>>
>
> cf feedback about "I'm not sure a new argument is relevant".
>
> + }
>>
>> # client side
>>
>> class wirepeer(peer.peerrepository):
>> diff --git a/tests/test-bundle2-exchange.t b/tests/test-bundle2-exchange.t
>> --- a/tests/test-bundle2-exchange.t
>> +++ b/tests/test-bundle2-exchange.t
>> @@ -408,8 +408,93 @@ Check final content.
>> other/.hg/store/phaseroots
>> $ ls -1 other/.hg/store/00changelog.i*
>> other/.hg/store/00changelog.i
>>
>> +Create a repository with tags data to test .hgtags fnodes transfer
>> +
>> + $ hg init tagsserver
>> + $ cd tagsserver
>> + $ touch foo
>> + $ hg -q commit -A -m initial
>> + pre-close-tip:96ee1d7354c4 draft
>> + postclose-tip:96ee1d7354c4 draft
>> + txnclose hook: HG_PHASES_MOVED=1 HG_TXNID=TXN:* HG_TXNNAME=commit
>> (glob)
>> + $ hg tag -m 'tag 0.1' 0.1
>> + pre-close-tip:c4dab0c2fd33 draft
>> + postclose-tip:c4dab0c2fd33 draft
>> + txnclose hook: HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
>> + $ echo second > foo
>> + $ hg commit -m second
>> + pre-close-tip:f63cc8fe54e4 draft
>> + postclose-tip:f63cc8fe54e4 draft
>> + txnclose hook: HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
>> + $ hg tag -m 'tag 0.2' 0.2
>> + pre-close-tip:40f0358cb314 draft
>> + postclose-tip:40f0358cb314 draft
>> + txnclose hook: HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
>> + $ hg tags
>> + tip 3:40f0358cb314
>> + 0.2 2:f63cc8fe54e4
>> + 0.1 0:96ee1d7354c4
>> + $ cd ..
>> +
>> +Cloning should pull down hgtags fnodes mappings and write the cache file
>> +
>> + $ hg clone --pull tagsserver tagsclient
>> + requesting all changes
>> + adding changesets
>> + adding manifests
>> + adding file changes
>> + added 4 changesets with 4 changes to 2 files
>> + pre-close-tip:40f0358cb314 draft
>> + postclose-tip:40f0358cb314 draft
>> + txnclose hook: HG_NODE=96ee1d7354c4ad7372047672c36a1f561e3a6a4c
>> HG_PHASES_MOVED=1 HG_SOURCE=pull HG_TXNID=TXN:* HG_TXNNAME=pull (glob)
>> + file:/*/$TESTTMP/tagsserver HG_URL=file:$TESTTMP/tagsserver (glob)
>> + updating to branch default
>> + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved
>> +
>> +Missing tags2* files means the cache wasn't written through the normal
>> mechanism.
>> +
>> + $ ls tagsclient/.hg/cache
>> + branch2-served
>> + hgtagsfnodes1
>> + rbc-names-v1
>> + rbc-revs-v1
>> +
>> +Cache should contain the head only, even though other nodes have tags
>> data
>> +
>> + $ f --size --hexdump tagsclient/.hg/cache/hgtagsfnodes1
>> + tagsclient/.hg/cache/hgtagsfnodes1: size=96
>> + 0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> + 0010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> + 0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> + 0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> + 0040: ff ff ff ff ff ff ff ff 40 f0 35 8c 19 e0 a7 d3
>> |........ at .5.....|
>> + 0050: 8a 5c 6a 82 4d cf fb a5 87 d0 2f a3 1e 4f 2f 8a
>> |.\j.M...../..O/.|
>> +
>> +Running hg tags should produce tags2* file and not change cache
>> +
>> + $ hg -R tagsclient tags
>> + tip 3:40f0358cb314
>> + 0.2 2:f63cc8fe54e4
>> + 0.1 0:96ee1d7354c4
>> +
>> + $ ls tagsclient/.hg/cache
>> + branch2-served
>> + hgtagsfnodes1
>> + rbc-names-v1
>> + rbc-revs-v1
>> + tags2-visible
>> +
>> + $ f --size --hexdump tagsclient/.hg/cache/hgtagsfnodes1
>> + tagsclient/.hg/cache/hgtagsfnodes1: size=96
>> + 0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> + 0010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> + 0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> + 0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> + 0040: ff ff ff ff ff ff ff ff 40 f0 35 8c 19 e0 a7 d3
>> |........ at .5.....|
>> + 0050: 8a 5c 6a 82 4d cf fb a5 87 d0 2f a3 1e 4f 2f 8a
>> |.\j.M...../..O/.|
>> +
>> Error Handling
>> ==============
>>
>> Check that errors are properly returned to the client during push.
>> diff --git a/tests/test-hgweb-commands.t b/tests/test-hgweb-commands.t
>> --- a/tests/test-hgweb-commands.t
>> +++ b/tests/test-hgweb-commands.t
>> @@ -1885,9 +1885,9 @@ capabilities
>>
>> $ "$TESTDIR/get-with-headers.py" 127.0.0.1:$HGPORT
>> '?cmd=capabilities'; echo
>> 200 Script output follows
>>
>> - lookup changegroupsubset branchmap pushkey known getbundle
>> unbundlehash batch
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1*%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 (glob)
>> + lookup changegroupsubset branchmap pushkey known getbundle
>> unbundlehash batch
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps%0Atagsfnodes
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>>
>> heads
>>
>> $ "$TESTDIR/get-with-headers.py" 127.0.0.1:$HGPORT '?cmd=heads'
>> @@ -2065,9 +2065,9 @@ capabilities
>>
>> $ "$TESTDIR/get-with-headers.py" 127.0.0.1:$HGPORT
>> '?cmd=capabilities'; echo
>> 200 Script output follows
>>
>> - lookup changegroupsubset branchmap pushkey known getbundle
>> unbundlehash batch stream-preferred stream
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1*%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 (glob)
>> + lookup changegroupsubset branchmap pushkey known getbundle
>> unbundlehash batch stream-preferred stream
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps%0Atagsfnodes
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>>
>> heads
>>
>> ERRORS ENCOUNTERED
>> diff --git a/tests/test-ssh.t b/tests/test-ssh.t
>> --- a/tests/test-ssh.t
>> +++ b/tests/test-ssh.t
>> @@ -451,10 +451,10 @@ debug output
>> pulling from ssh://user@dummy/remote
>> running python "*/dummyssh" user at dummy 'hg -R remote serve --stdio'
>> (glob)
>> sending hello command
>> sending between command
>> - remote: 271
>> - remote: capabilities: lookup changegroupsubset branchmap pushkey known
>> getbundle unbundlehash batch stream
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>> + remote: 284
>> + remote: capabilities: lookup changegroupsubset branchmap pushkey known
>> getbundle unbundlehash batch stream
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps%0Atagsfnodes
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>> remote: 1
>> preparing listkeys for "bookmarks"
>> sending listkeys command
>> received listkey for "bookmarks": 45 bytes
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at selenic.com
>> https://selenic.com/mailman/listinfo/mercurial-devel
>>
>>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150601/7e317c34/attachment-0001.html>
More information about the Mercurial-devel
mailing list