filectx: strange performance behavior

Mon May 11 20:52:02 CDT 2015

On Fri, 2015-05-08 at 20:19 +0200, Marc Strapetz wrote:
> As part of an extension, I'm using following code which takes several 
> seconds to execute for a large repository (e.g. setup.py of the 
> Mercurial repository):
> 
>    filectx = repo['.'][path]
>    filectxs = []
>    for ctx in filectx.ancestors():
>      filectxs.append(ctx)
> 
>    # this loop takes a long time
>    for ctx in filectxs:
>      ui.write(str(ctx.rev()) + '\n')
> 
> Interestingly, rewriting the loop to the following code gives results 
> almost instantly:
> 
>    filectx = repo['.'][path]
> 
>    # this loop executes almost instantly
>    for ctx in filectx.ancestors():
>      ui.write(str(ctx.rev()) + '\n')
> 
> Also, when adding an additional rev(), the original loop becomes fast:
> 
>    filectx = repo['.'][path]
>    filectxs = []
>    for ctx in filectx.ancestors():
>      ctx.rev() # <-- this makes the loop fast
>      filectxs.append(ctx)
> 
>    # now the loop executes almost instantly
>    for ctx in filectxs:
>      ui.write(str(ctx.rev()) + '\n')
> 
> Probably this is expected, but my Mercurial/Python knowledge is too 
> limited to understand that ... so any hints are appreciated :)

Short answer: it's magic.

Long answer: the mapping between file revisions and changesets is not
1:1, so determining which rev() goes with a context is not always
trivial. One of your paths makes this work easier, because when we
create each new filectx, we know roughly how we got there in changeset
terms. But the other way, we don't have that information readily
available yet because we haven't asked to compute it.

-- 
Mathematics is the supreme nostalgia of our time.