[PATCH] revlog: speed up prefix matching against nodes

Matt Mackall mpm at selenic.com
Fri May 4 12:22:23 CDT 2012


On Wed, 2012-05-02 at 16:37 -0700, Bryan O'Sullivan wrote:
> # HG changeset patch
> # User Bryan O'Sullivan <bryano at fb.com>
> # Date 1336001830 25200
> # Branch stable
> # Node ID 365a246d027bb04223f52cdeab9772aafab5d83e
> # Parent  c82db2287f3b11d191dd46240fdb9439f9a72448
> revlog: speed up prefix matching against nodes
> 
> The radix tree already contains all the information we need to
> determine whether a short string is an unambiguous node identifier.
> We now make use of this information.
> 
> In a kernel tree, this improves the performance of
> "hg log -q -r24bf01de75" from 0.27 seconds to 0.06.

> +	if (allhex(orig, origlen)) {
> +		bin = unhexlify(orig, origlen & ~1);

This is an odd way to do things, I think. We do hex->bin in the Python
version so we can do startswith(bin) and deal with the issue of
odd-length strings after (which seems to be missing here), but here we
really want to work a nibble at a time. So it'd probably better to pass
the string into nt_find directly with a hex flag.

> +	if (rev == -2 && bin != NULL) {
> +		/* Tiny chance that the hex string we decoded was actually
> +		 * binary that happened to be all hex digits. */

We should never attempt to do a binary partial match so I don't think
this can happen.

Functions like nt_init should not appear in a patch like this. If you
need to move code or do other cleanups to prepare to add something,
please send them first. That lets me queue them and take them off your
plate and have an easier patch to review.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list