[PATCH 1 of 4] basectx: cache filectx generation for all contexts

Pierre-Yves David pierre-yves.david at ens-lyon.org
Tue Aug 19 01:23:02 CDT 2014



On 08/18/2014 07:42 PM, Sean Farley wrote:
> # HG changeset patch
> # User Sean Farley <sean.michael.farley at gmail.com>
> # Date 1408408678 18000
> #      Mon Aug 18 19:37:58 2014 -0500
> # Node ID 594d00e49cea6cd2ebb92c0570e7502f8797e232
> # Parent  8dda6f6ff564d8fe6ac7b8ce4c74eb9bfb5de14a
> basectx: cache filectx generation for all contexts
>
> Previously, filecontexts are generated on the fly but not cached.

Should be "were"

> We need to keep them around if we're going to allow the data inside
> to be changed.

So the motivation for this is the ability to store content when we forge 
a changeset in memory. You should make it more explicite

> This change is added to basectx so that other contexts that inherit can
> benefit as well.

I'm pretty sure we do not want filectx to be cache in the general case.

- There is people working with insanely huge working copy.

- There is tool keeping changectx around (like probably hgview) assuming 
its a cheap object

- The usual assumption is that Mercurial requires about O(<largest file 
size>) memory. You are making this O(<largest changesets content>)

I recommend taking the easiest road first. Implement caching only for 
"Content overwritten in a memctx". Other people will be eager to 
optimize anything as long as you have something working that can get 
some users.

Also, the filectx object keep a reference to the changectx. So you 
cannot hold a reference from the changectx to the filectx without 
introducing a cycle

> Currently, this is disabled for memctx objects that are passed a callable data
> store function.
>
> diff --git a/mercurial/context.py b/mercurial/context.py
> --- a/mercurial/context.py
> +++ b/mercurial/context.py
> @@ -31,10 +31,11 @@ class basectx(object):
>           o = super(basectx, cls).__new__(cls)
>
>           o._repo = repo
>           o._rev = nullrev
>           o._node = nullid

This would need some documentation about what is contains and when it is 
disabled.

> +        o._filectxs = {}
>
>           return o
>
>       def __str__(self):
>           return short(self.node())
> @@ -56,11 +57,20 @@ class basectx(object):
>
>       def __contains__(self, key):
>           return key in self._manifest
>
>       def __getitem__(self, key):
> -        return self.filectx(key)
> +        try:
> +            return self._filectxs[key]
> +        except TypeError:
> +            # filectx caching is not enabled
> +            return self.filectx(key)
> +        except KeyError:
> +            v = self.filectx(key)
> +            self._filectxs[key] = v
> +            return v
> +

 From what I read below, self._filectxs is either a dict of None. So you 
should:

1) use "self._filectxs is None" instead of catching a type error
2) use "self._filectxs.get(…)" instead of KeyError

>
>       def __iter__(self):
>           for f in sorted(self._manifest):
>               yield f
>
> @@ -1614,10 +1624,14 @@ class memctx(committablectx):
>                       copied = copied[0]
>                   return memfilectx(repo, path, fctx.data(),
>                                     islink=fctx.islink(), isexec=fctx.isexec(),
>                                     copied=copied, memctx=memctx)
>               self._filectxfn = getfilectx
> +        else:
> +            # disable filectx caching for users that send in custom function,
> +            # e.g. hg convert
> +            self._filectxs = None
>
>           self._extra = extra and extra.copy() or {}
>           if self._extra.get('branch', '') == '':
>               self._extra['branch'] = 'default'
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list