[PATCH 5 of 7] revlog: add a fast path for non-raw revision

Ryan McElroy rm at fb.com
Wed Mar 29 08:41:21 EDT 2017


On 3/28/17 8:49 AM, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark at fb.com>
> # Date 1490683355 25200
> #      Mon Mar 27 23:42:35 2017 -0700
> # Node ID aecce2adbd64b25325e559798baa617e7311e85f
> # Parent  a9d87712bec99abe109c155948ee4b7f1f5ec208
> # Available At https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=HZJ95FJtSQKfeTym596Dsrw1nj2_uFqbuHYgwPBJDS0&s=0utxjEpjZ6EWkHbcRaaNJmSr6QF_yI6rObSckqL3BII&e=
> #              hg pull https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=HZJ95FJtSQKfeTym596Dsrw1nj2_uFqbuHYgwPBJDS0&s=0utxjEpjZ6EWkHbcRaaNJmSr6QF_yI6rObSckqL3BII&e=  -r aecce2adbd64
> revlog: add a fast path for non-raw revision
>
> If flags are empty, it's equal to "raw=True". Since "raw=True" is less
> common but "flags are empty" is very common, add the flags check. So on
> cache hit, revlog.revision could go through the fast path in the most common
> cases.
>
> This also makes some hash check unnecessary, so the test is changed a bit.

Thanks for explaining this. It's a non-obvious change :-)

>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1258,4 +1258,5 @@ class revlog(object):
>   
>           cachedrev = None
> +        flags = None

Why is this initialized to None here? Why not initialize to 
self.flags(rev) and avoid the two None checks later? Add a comment if 
you keep it this way. The self.flags method looks quite cheap but maybe 
I'm missing something.

>           if node == nullid:
>               return ""
> @@ -1264,4 +1265,9 @@ class revlog(object):
>                   if raw:
>                       return self._cache[2]
> +                if rev is None:
> +                    rev = self.rev(node)
> +                flags = self.flags(rev)
> +                if not flags:
> +                    return self._cache[2]

I think this needs some comments and perhaps some refactoring.

I'd suggest something like this:

# useful comment
if raw or not self.flags:

This would be made possible if you eagerly initialize flags above.

Upon closer inspection, maybe its self.rev() that you're trying to avoid 
here, but is the expense there the radix tree? In this case, it should 
always be a full hash so we shouldn't need a radixtree right? Can we 
fastpath this somehow instead?

That may be out of the scope of this series but a comment to explain 
your choice is important.

>               cachedrev = self._cache[1]
>   
> @@ -1285,6 +1291,7 @@ class revlog(object):
>           text = mdiff.patches(text, bins)
>   
> -        newtext, validatehash = self._processflags(text, self.flags(rev),
> -                                                   'read', raw=raw)
> +        if flags is None:
> +            flags = self.flags(rev)
> +        newtext, validatehash = self._processflags(text, flags, 'read', raw=raw)
>           if validatehash:
>               self.checkhash(newtext, node, rev=rev)
> diff --git a/tests/test-revlog-raw.py.out b/tests/test-revlog-raw.py.out
> --- a/tests/test-revlog-raw.py.out
> +++ b/tests/test-revlog-raw.py.out
> @@ -23,5 +23,4 @@ REV RAW DATA    CACHE BEFORE -> AFTER
>     # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
>     2 y   02T35   1 +012T34       2 02T35
> -  # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
>     2     02T35   2 02T35         2 02T35
>   
> @@ -44,5 +43,4 @@ REV RAW DATA    CACHE BEFORE -> AFTER
>     # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
>     2 y   02T35   None            2 02T35
> -  # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
>     2     02T35   2 02T35         2 02T35
>     1 y   +012T34 2 02T35         1 +012T34
>



More information about the Mercurial-devel mailing list