Linkrev perf

Jun Wu quark at fb.com
Tue Sep 27 11:24:29 EDT 2016


I changed the subject to reflect the actual topic.

The map or whatever structure to replace one single linkrev to a list of
"introrev"s (or flags indicating it is not unique) is just a hack.

The long-term solution would be something different, ideally each fctx has
one unique linkrev, and any filelog is a subgraph of the changelog.

It seems we have to add some noise when creating a new filenode to address
the problem. But hash change is a big BC. So one crazier idea is to extend
the hash - we still have 16 bytes unused for nodeid everywhere. So it would
be like:

  - filenode would have extra N bytes for each revision, purely noise
  - manifest would be the complex part - we need to record those N bytes for
    each file, while we don't want to have BC or hash change.
    The solution would be like to have a separated manifest just storing
    those noises for each file path. So the hash of the main manifest does
    not change - thus compatible with old clients.
    Manifest nodeid would be extended by N bytes as well.
  - changelog nodeid would have a similar change - extra N bytes
    representing the noise.

That sounds ugly anyway. If we can make a big BC on the hash (just like
treemanifest), I think it's may be the time to change more things, like:

  - Include the order of p1, p2 in the hash
  - Like git, hash file blobs without parentids, say it's blobid, so we can
    reuse blobs for copies, reverts, and across repos.
  - Generate some noise (like, uuid) for each changeset being created
    (thanks Durham for this idea), in filelog, let nodeid = hash((blobid,
    p1, p2, intro-changeset-uuid)). The uuid could be reused for the amend
    case.

Notes:

  - The noise is a dig at my own "reduce-noise" attempt. I was planning to
    move amend_source etc. to obsstore. But after a second thought, I
    believe we have to have some kind of noises for performance regarding on
    single file traversal.
  - We could provide a command to hash the tree if we have the blob layer,
    for users who want to check if two commits have the exactly same file
    contents.

Excerpts from Pierre-Yves David's message of 2016-09-27 16:08:33 +0200:
> On 09/24/2016 07:59 PM, Jun Wu wrote:
> > By the way, I just encountered an _adjustlinkrev related perf issue when
> > working on fastannotate internally. introrev (_adjustlinkrev) can take up
> > seconds (and sadly it just returns ctx.linerev() most of the time). Seconds
> > is unacceptable comparing with the annotate logic which only takes tens of
> > milliseconds.
> 
> Did you check that that the ancestry context where properly inherited 
> from one check to another, this helps with the cost of adjustlinkrev.
> 
> > I think a stupid way to make it better is to have a map of {fctxnode:
> > [linknode]}. To make the map small, it only stores entries where
> > len([linknode]) > 1. The challenge would be how to keep the map up-to-date.
> 
> A smaller step would be to have a fast way to know if a fnode is used 
> more than once. We can skip linkrev adjustment for anything with only 
> one entry.
> 
> A "multiple entry flag" have this interesting similarity with other 
> proper with are considering bitmap for: "public", "obs". It is a 
> "forward only" property and we could reuse transaction safe logic for 
> all of them.
> 
> 
> Building a very basic version of this (eg: no cache invalidation, simple 
> json on disk) would be a good way to get some number and see how much it 
> actually improves performance.


More information about the Mercurial-devel mailing list