[PATCH 6 of 6] bundle2: support transferring .hgtags fnodes mapping

Gregory Szorc gregory.szorc at gmail.com
Mon Jun 1 22:02:41 CDT 2015


On Sun, May 31, 2015 at 10:40 PM, Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 05/31/2015 05:28 PM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1432599251 25200
>> #      Mon May 25 17:14:11 2015 -0700
>> # Node ID 583b57ce3bf187f0d30db1336288f1671acfc0c2
>> # Parent  8f34e8f9636d47c5094844c4e8bbf7353b95d75a
>> bundle2: support transferring .hgtags fnodes mapping
>>
>
> small note: I love how you had a lot of clearly split patch from 1-5 why
> is this some so gargantuan?
>
> 1) add part handler
> 2) add optional part generation
> 3) add on by default part generation


Sorry. I had to implement this all at once in order to figure out the
bundle2 API and I didn't think to split up the patch before patchbombing.
I'll do that next time.


>
>
>  On Mozilla's mozilla-beta repository .hgtags fnodes resolution takes
>> ~18s from a clean cache on my machine. This means that the first time
>> a user runs `hg tags`, `hg log`, or any other command that displays or
>> accesses tags data, a ~18s pause will occur. There is no output during
>> this pause. This results in a poor user experience and perception
>> that Mercurial is slow.
>>
>> The .hgtags changeset to filenode mapping is deterministic. This
>> patch takes advantage of that property by implementing support
>> for transferring .hgtags filenodes mappings in a dedicated bundle2
>> part. When a client performs an `hg clone` or `hg pull` against a
>> server advertising support for serving "hgtagsfnodes" data, a mapping
>> of changesets to .hgtags filenodes will be sent back to the client.
>> Only mappings of changesets included in the server-generated bundle
>> will be sent. The transfer of this mapping effectively eliminates
>> one time tags cache related pauses after initial clone.
>>
>> The mappings are sent as binary data. So, 40 bytes per pair of
>> SHA-1s. On the aforementioned mozilla-beta repository,
>> 659 * 40 = 26,360 raw bytes of mappings are sent over the wire
>> (in addition to the bundle part headers). Assuming 18s to populate
>> the cache, we only need to transfer this extra data faster than
>> 1.5 KB/s for overall clone + tags cache population time to be shorter.
>> Put into perspective, the mozilla-beta repository is ~1 GB in size.
>> So, this additional data constitutes <0.01% of the cloned data.
>> The marginal overhead for a multi-second performance win on clones
>> in my opinion justifies an on-by-default behavior. If this turns
>> out to be too naive, we can always add an heuristic (read complexity)
>> to determine if the data transfer is warranted.
>>
>> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
>> --- a/mercurial/bundle2.py
>> +++ b/mercurial/bundle2.py
>> @@ -155,9 +155,9 @@ import obsolete
>>   import pushkey
>>   import url
>>   import re
>>
>> -import changegroup, error
>> +import changegroup, error, tags
>>   from i18n import _
>>
>>   _pack = struct.pack
>>   _unpack = struct.unpack
>> @@ -1112,8 +1112,9 @@ capabilities = {'HG20': (),
>>                   'listkeys': (),
>>                   'pushkey': (),
>>                   'digests': tuple(sorted(util.DIGESTS.keys())),
>>                   'remote-changegroup': ('http', 'https'),
>> +                'tagsfnodes': (),
>>                  }
>>
>>   def getrepocaps(repo, allowpushback=False):
>>       """return the bundle2 capabilities for a given repo
>> @@ -1362,4 +1363,23 @@ def handlepushkeyreply(op, inpart):
>>       """retrieve the result of a pushkey request"""
>>       ret = int(inpart.params['new'])
>>       partid = int(inpart.params['in-reply-to'])
>>       op.records.add('obsmarkers', {'new': ret}, partid)
>> +
>> + at parthandler('tagsfnodes')
>> +def handletagsfnodes(op, inpart):
>> +    cache = tags.hgtagsfnodescache(op.repo.unfiltered())
>> +
>> +    # Raw data is pairs of 20 byte changeset nodes and filenodes.
>> +    data = inpart.read()
>>
>
> Why do you read the whole part before processing?
>

Old habit trading memory for fewer I/O related system calls. I see the
unbundler is using some kind of in-memory buffer. So I'll happily switch
this to reading 20 bytes at a time since apparently that won't translate to
system calls.


>
> You could read data 20 chunk per 20 chunk:
> 1) you would start processing earlier is the number of entry is huge,
> 2) less memory consumption,
> 3) no manual offset management.
>
>  +    offset = 0
>> +    count = 0
>> +    while offset < len(data):
>> +        node = data[offset:offset + 20]
>> +        offset += 20
>> +        fnode = data[offset:offset + 20]
>> +        offset += 20
>>
>
> This is data coming wrong the outside, you should check that they are
> actually 20 byte wide and ignore or error cleanly.
>
>  +        cache.setfnode(node, fnode)
>> +        count += 1
>> +
>> +    cache.write()
>> +    op.repo.ui.debug('%i hgtags fnodes cache entries\n' % count)
>>
>
> Did you forgot a verb in this debug message? If not, I'm not sure how I
> would understand it in the field.
>
>
>  diff --git a/mercurial/exchange.py b/mercurial/exchange.py
>> --- a/mercurial/exchange.py
>> +++ b/mercurial/exchange.py
>> @@ -11,8 +11,9 @@ from node import hex, nullid
>>   import errno, urllib
>>   import util, scmutil, changegroup, base85, error, store
>>   import discovery, phases, obsolete, bookmarks as bookmod, bundle2,
>> pushkey
>>   import lock as lockmod
>> +import tags
>>
>>   def readbundle(ui, fh, fname, vfs=None):
>>       header = changegroup.readexactly(fh, 4)
>>
>> @@ -995,8 +996,11 @@ def _pullbundle2(pullop):
>>       kwargs['heads'] = pullop.heads or pullop.rheads
>>       kwargs['cg'] = pullop.fetch
>>       if 'listkeys' in remotecaps:
>>           kwargs['listkeys'] = ['phase', 'bookmarks']
>> +    # Retrieve .hgtags fnodes mappings if available.
>> +    if 'tagsfnodes' in remotecaps:
>> +        kwargs['tagsfnodes'] = True
>>
>
> I not sure we need/want a new kwargs for that. Does it make sense to not
> send them down with any changegroup part? in this case, the remote side
> should build such part it the client support it (the client send his b2
> capabilities to the server for the pull).
>

This was me not groking bundle2. I'll revisit this.


>
>        if not pullop.fetch:
>>           pullop.repo.ui.status(_("no changes found\n"))
>>           pullop.cgresult = 0
>>       else:
>> @@ -1280,8 +1284,35 @@ def _getbundleobsmarkerpart(bundler, rep
>>           markers = repo.obsstore.relevantmarkers(subset)
>>           markers = sorted(markers)
>>           buildobsmarkerspart(bundler, markers)
>>
>> + at getbundle2partsgenerator('tagsfnodes')
>> +def _getbundletagsfnodes(bundler, repo, source, bundlecaps=None,
>> +                         b2caps=None, heads=None, **kwargs):
>> +    """Transfer the .hgtags filenodes mapping.
>> +
>> +    Only values for changesets part of this bundle will be transferred.
>> +
>> +    The part data consists of pairs of 20 byte changeset node and .hgtags
>> +    filenodes raw values.
>> +    """
>> +    if not kwargs.get('tagsfnodes', False) or not bundler.outgoing:
>> +        return
>>
>
> cf feedback about "I'm not sure a new argument is relevant".
>
>  +
>> +    cache = tags.hgtagsfnodescache(repo.unfiltered())
>> +    chunks = []
>> +    # .hgtags fnodes are only relevant for head changesets. While we
>> could
>> +    # transfer values for all known nodes, there will likely be little to
>> +    # no benefit.
>> +    for node in bundler.outgoing.missingheads:
>>
>
> The "missingheads" data is pretty trivially computed from the request
> argument.
>
>    if heads is None:
>        heads = cl.heads()
>    # and that is it.
>
> So I believe you do not have to do any attempt to carry the outgoing
> object around. This should simplify the whole thing.
>
>  +        # Don't compute missing, as this may slow down serving.
>> +        fnode = cache.getfnode(node, computemissing=False)
>> +        if fnode is not None:
>> +            chunks.extend([node, fnode])
>> +
>> +    if chunks:
>> +        bundler.newpart('tagsfnodes', data=''.join(chunks))
>> +
>>   def check_heads(repo, their_heads, context):
>>       """check if the heads of a repo have been modified
>>
>>       Used by peer for unbundling.
>> diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>> --- a/mercurial/wireproto.py
>> +++ b/mercurial/wireproto.py
>> @@ -208,9 +208,11 @@ gboptsmap = {'heads':  'nodes',
>>                'common': 'nodes',
>>                'obsmarkers': 'boolean',
>>                'bundlecaps': 'csv',
>>                'listkeys': 'csv',
>> -             'cg': 'boolean'}
>> +             'cg': 'boolean',
>> +             'tagsfnodes': 'boolean',
>>
>
> cf feedback about "I'm not sure a new argument is relevant".
>
>  +             }
>>
>>   # client side
>>
>>   class wirepeer(peer.peerrepository):
>> diff --git a/tests/test-bundle2-exchange.t b/tests/test-bundle2-exchange.t
>> --- a/tests/test-bundle2-exchange.t
>> +++ b/tests/test-bundle2-exchange.t
>> @@ -408,8 +408,93 @@ Check final content.
>>     other/.hg/store/phaseroots
>>     $ ls -1 other/.hg/store/00changelog.i*
>>     other/.hg/store/00changelog.i
>>
>> +Create a repository with tags data to test .hgtags fnodes transfer
>> +
>> +  $ hg init tagsserver
>> +  $ cd tagsserver
>> +  $ touch foo
>> +  $ hg -q commit -A -m initial
>> +  pre-close-tip:96ee1d7354c4 draft
>> +  postclose-tip:96ee1d7354c4 draft
>> +  txnclose hook: HG_PHASES_MOVED=1 HG_TXNID=TXN:* HG_TXNNAME=commit
>> (glob)
>> +  $ hg tag -m 'tag 0.1' 0.1
>> +  pre-close-tip:c4dab0c2fd33 draft
>> +  postclose-tip:c4dab0c2fd33 draft
>> +  txnclose hook: HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
>> +  $ echo second > foo
>> +  $ hg commit -m second
>> +  pre-close-tip:f63cc8fe54e4 draft
>> +  postclose-tip:f63cc8fe54e4 draft
>> +  txnclose hook: HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
>> +  $ hg tag -m 'tag 0.2' 0.2
>> +  pre-close-tip:40f0358cb314 draft
>> +  postclose-tip:40f0358cb314 draft
>> +  txnclose hook: HG_TXNID=TXN:* HG_TXNNAME=commit (glob)
>> +  $ hg tags
>> +  tip                                3:40f0358cb314
>> +  0.2                                2:f63cc8fe54e4
>> +  0.1                                0:96ee1d7354c4
>> +  $ cd ..
>> +
>> +Cloning should pull down hgtags fnodes mappings and write the cache file
>> +
>> +  $ hg clone --pull tagsserver tagsclient
>> +  requesting all changes
>> +  adding changesets
>> +  adding manifests
>> +  adding file changes
>> +  added 4 changesets with 4 changes to 2 files
>> +  pre-close-tip:40f0358cb314 draft
>> +  postclose-tip:40f0358cb314 draft
>> +  txnclose hook: HG_NODE=96ee1d7354c4ad7372047672c36a1f561e3a6a4c
>> HG_PHASES_MOVED=1 HG_SOURCE=pull HG_TXNID=TXN:* HG_TXNNAME=pull (glob)
>> +  file:/*/$TESTTMP/tagsserver HG_URL=file:$TESTTMP/tagsserver (glob)
>> +  updating to branch default
>> +  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
>> +
>> +Missing tags2* files means the cache wasn't written through the normal
>> mechanism.
>> +
>> +  $ ls tagsclient/.hg/cache
>> +  branch2-served
>> +  hgtagsfnodes1
>> +  rbc-names-v1
>> +  rbc-revs-v1
>> +
>> +Cache should contain the head only, even though other nodes have tags
>> data
>> +
>> +  $ f --size --hexdump tagsclient/.hg/cache/hgtagsfnodes1
>> +  tagsclient/.hg/cache/hgtagsfnodes1: size=96
>> +  0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> +  0010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> +  0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> +  0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> +  0040: ff ff ff ff ff ff ff ff 40 f0 35 8c 19 e0 a7 d3
>> |........ at .5.....|
>> +  0050: 8a 5c 6a 82 4d cf fb a5 87 d0 2f a3 1e 4f 2f 8a
>> |.\j.M...../..O/.|
>> +
>> +Running hg tags should produce tags2* file and not change cache
>> +
>> +  $ hg -R tagsclient tags
>> +  tip                                3:40f0358cb314
>> +  0.2                                2:f63cc8fe54e4
>> +  0.1                                0:96ee1d7354c4
>> +
>> +  $ ls tagsclient/.hg/cache
>> +  branch2-served
>> +  hgtagsfnodes1
>> +  rbc-names-v1
>> +  rbc-revs-v1
>> +  tags2-visible
>> +
>> +  $ f --size --hexdump tagsclient/.hg/cache/hgtagsfnodes1
>> +  tagsclient/.hg/cache/hgtagsfnodes1: size=96
>> +  0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> +  0010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> +  0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> +  0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> |................|
>> +  0040: ff ff ff ff ff ff ff ff 40 f0 35 8c 19 e0 a7 d3
>> |........ at .5.....|
>> +  0050: 8a 5c 6a 82 4d cf fb a5 87 d0 2f a3 1e 4f 2f 8a
>> |.\j.M...../..O/.|
>> +
>>   Error Handling
>>   ==============
>>
>>   Check that errors are properly returned to the client during push.
>> diff --git a/tests/test-hgweb-commands.t b/tests/test-hgweb-commands.t
>> --- a/tests/test-hgweb-commands.t
>> +++ b/tests/test-hgweb-commands.t
>> @@ -1885,9 +1885,9 @@ capabilities
>>
>>     $ "$TESTDIR/get-with-headers.py" 127.0.0.1:$HGPORT
>> '?cmd=capabilities'; echo
>>     200 Script output follows
>>
>> -  lookup changegroupsubset branchmap pushkey known getbundle
>> unbundlehash batch
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1*%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 (glob)
>> +  lookup changegroupsubset branchmap pushkey known getbundle
>> unbundlehash batch
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps%0Atagsfnodes
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>>
>>   heads
>>
>>     $ "$TESTDIR/get-with-headers.py" 127.0.0.1:$HGPORT '?cmd=heads'
>> @@ -2065,9 +2065,9 @@ capabilities
>>
>>     $ "$TESTDIR/get-with-headers.py" 127.0.0.1:$HGPORT
>> '?cmd=capabilities'; echo
>>     200 Script output follows
>>
>> -  lookup changegroupsubset branchmap pushkey known getbundle
>> unbundlehash batch stream-preferred stream
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1*%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 (glob)
>> +  lookup changegroupsubset branchmap pushkey known getbundle
>> unbundlehash batch stream-preferred stream
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps%0Atagsfnodes
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>>
>>   heads
>>
>>   ERRORS ENCOUNTERED
>> diff --git a/tests/test-ssh.t b/tests/test-ssh.t
>> --- a/tests/test-ssh.t
>> +++ b/tests/test-ssh.t
>> @@ -451,10 +451,10 @@ debug output
>>     pulling from ssh://user@dummy/remote
>>     running python "*/dummyssh" user at dummy 'hg -R remote serve --stdio'
>> (glob)
>>     sending hello command
>>     sending between command
>> -  remote: 271
>> -  remote: capabilities: lookup changegroupsubset branchmap pushkey known
>> getbundle unbundlehash batch stream
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>> +  remote: 284
>> +  remote: capabilities: lookup changegroupsubset branchmap pushkey known
>> getbundle unbundlehash batch stream
>> bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps%0Atagsfnodes
>> unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024
>>     remote: 1
>>     preparing listkeys for "bookmarks"
>>     sending listkeys command
>>     received listkey for "bookmarks": 45 bytes
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at selenic.com
>> https://selenic.com/mailman/listinfo/mercurial-devel
>>
>>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150601/7e317c34/attachment-0001.html>


More information about the Mercurial-devel mailing list