[PATCH 8 of 9] localrepo: add findbyhash for convenience

Mon Mar 30 11:58:56 CDT 2015

At Sun, 29 Mar 2015 12:17:00 -0700,
Gregory Szorc wrote:
> 
> [1  <text/plain; UTF-8 (7bit)>]
> On Sun, Mar 29, 2015 at 11:34 AM, FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> wrote:
> 
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1427653752 -32400
> > #      Mon Mar 30 03:29:12 2015 +0900
> > # Node ID 8a0e73ddbd401236f42f0afd65f254c4cdcdce08
> > # Parent  ce47b26e3c0d6104b5bfb2d15e59b5a2af573956
> > localrepo: add findbyhash for convenience
> >
> > This allows to use "context.findbyhash()" easily, without additional
> > importing "context".
> >
> > Using "repo.findbyhash(hashid)" instead of "hashid in repo" is also
> > useful to avoid meaningless matching against "repo.dirstate.p1()",
> > reserved names (e.g. "null", "tip", "."), bookmarks, tags, branches
> > and so on.
> >
> > Matching against "repo.dirstate.p1()" is especially inefficient,
> > because it may cause useless (re-)loading from ".hg/dirstate" or
> > invoking "changelog.rev(bin(hashid))" for validation.
> >
> > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> > --- a/mercurial/localrepo.py
> > +++ b/mercurial/localrepo.py
> > @@ -1915,6 +1915,20 @@ class localrepository(object):
> >          """
> >          return context.findbyrevnum(self, revnum, abort=abort)
> >
> > +    def findbyhash(self, hashid, abort=False):
> > +        """Find the revision by hash ID in this repository
> > +
> > +        ``hashid`` should be ``str``.
> > +
> > +        This returns ``(node, revnum)`` tuple, if the revision
> > +        corresponded to the specified ``hashid`` is found. If that
> > +        revision is hidden one, this raises FilteredRepoLookupError.
> > +
> > +        Other wise, this returns ``None`` (or raises RepoLookupError if
> > +        ``abort``).
> > +        """
> > +        return context.findbyhash(self, hashid, abort=abort)
> > +
> >  # used to avoid circular references so destructors work
> >  def aftertrans(files):
> >      renamefiles = [tuple(t) for t in files]
> >
> 
> This patch is related to tags and names performance issues. When using
> "node in repo" or "repo[node]" with a 20 or 40 byte value that must be a
> node but isn't present, context.changectx.__init__ falls back to names
> lookup [1]. This can be quite expensive for the tags cache, as it means
> calculating heads and .hgtags filenodes. It seems like avoidable overhead.
> 
> For consumers that know they are operating on node values and not symbolic
> names (wire protocol, caches, etc), I think it makes sense to have a "get
> context" API that only operates on rev or node inputs. This is very similar
> to what's implemented in this patch series. The big difference is I'd be
> looking for something that returns a context instance, not just rev and
> node. How would such an API fit into what you've implemented here?

I mainly focuses on making examination by "n in repo",
"repo.changelog.rev(n)" and so on efficient, and returning "(node,
rev)" is enough for such purpose.

But I can understand the need to get context strictly and efficiently, too.

Which would you like specifications below ?

  - always return "(node, rev, changectx)"

  - return "changectx" if "getcontext=True" or such argument, or
    return "(node, rev)" otherwise

> [1] http://selenic.com/repo/hg/file/efa094701a05/mercurial/context.py#l430
> [2  <text/html; UTF-8 (quoted-printable)>]
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp