Using qemu.git (77K commits), I see that a simple "hg bookmarks" takes a surprisingly long time (20 seconds). The problem is that we're missing the "don't reindex" optimization, because cur_cache_heads == cache_heads comparison is false. Why? Our sqlite3 connection has .text_factory = bytes, so cache_heads is in bytes. pygit2, on the other hand, returns Python strings, so cur_cache_heads is in strings. I don't know which way it should be, so I don't know if this change is appropriate, but it fixes the problem for me: --- a/hgext/git/index.py +++ b/hgext/git/index.py @@ -245,7 +245,7 @@ def _index_repo(gitrepo, db, progress_fa # TODO: we should figure out how to incrementally index history # (preferably by detecting rewinds!) so that we don't have to do a # full changelog walk every time a new commit is created. - cache_heads = {x[0] for x in db.execute('SELECT node FROM possible_heads')} + cache_heads = {x[0].decode("utf-8") for x in db.execute('SELECT node FROM possible_heads')} walker = None cur_cache_heads = {h.hex for h in possible_heads} if cur_cache_heads == cache_heads:
Fixed by https://mercurial-scm.org/repo/hg/rev/fb2936c5f6dc Hollis Blanchard <hollis_blanchard@mentor.com> git: decode node IDs back into Python strings (issue6349) db.text_factory = bytes, so the database contains only strings. The object IDs we get from pygit2 are Python strings. b'foo' != 'foo' This change allows the "don't reindex" optimization to work by allowing the "cur_cache_heads == cache_heads" comparison a few lines down to succeed. Differential Revision: https://phab.mercurial-scm.org/D8622 (please test the fix)
Bug was set to TESTING for 7 days, resolving