[PATCH 0 of 1] context: reduce use of expensive cmp even further

Mon Oct 18 03:42:20 CDT 2010

* Nicolas Dumazet on Monday, October 18, 2010 at 01:45:22 +0200
> 2010/10/15 Christian Ebert <blacktrash at gmx.net>:
>> This might be overkill. On the other hand it checks for the
>> presence of _encodefilterpats first, and for those who are
>> encoding this might still be faster.
> 
> Well I don't know if we want this, but it's interesting to explore the
> possibility.
> I actually do not know so much about encode filters: just that I had
> to avoid them for size checks because they are likely to change
> content size.
> Do we have a list/an idea of how/in which case they're supposed to be used?

One use is certainly hgext.eol.

> If, in average, we have very few filters, then it might save us some
> time. It would be worth to check what's happening.
> Would you know typical use cases, Christian? Would you be able to run
> some tests and come back with figures, to check if we're actually
> saving time with this patch?

Unfortunately I think you could come up easily with proofs for
both alternatives, as it entirely depends on the configuration.

In general, I'd say: the lower the ratio
files to be encoded/decoded vs. files to be left alone
the more time you save with this patch.

    def cmp(self, fctx):
        fmatch = False
        # the following condition (already there w/o the patch)
        # already excludes all cases where we don't compare
        # against the working directory
        if fctx._filerev is None:
            path = fctx.path() # extra call introduced by patch
            for mf, fn, cmd in self._repo._encodefilterpats:
                if mf(path):   # extra call for each filterpattern
                    fmatch = True
                    break
        if fmatch or self.size() == fctx.size():
            return self._filelog.cmp(self._filenode, fctx.data())

The first extra call -- fctx.path() -- should be neglectably
cheap. Now it depends on how many and how complicated match
filter match (functions) are configured by the user and whether
calling fctx.data() is more expensive. I think the only reliable
answer is: It depends.

If there are no encode/decode patterns configured (majority of
use cases, I _guess_) the difference is (when comparing against
the working directory):

+ fctx.path()
- if self._repo._encodefilterpats:
vs.
+ loop over empty self._repo._encodefilterpats

localrepo's filter function (called with every wread(), wwrite(),
wwritedata()) might be worth a look:

    def _filter(self, filterpats, filename, data):
        for mf, fn, cmd in filterpats:
            if mf(filename):
                self.ui.debug("filtering %s through %s\n" % (filename, cmd))
                data = fn(data, cmd, ui=self.ui, repo=self, filename=filename)
                break

        return data

> If we do save time, then of course I'd push this patch and its keyword brother.

It depends ;-)

c
-- 
theatre - books - texts - movies
Black Trash Productions at home: http://www.blacktrash.org
Black Trash Productions on Facebook:
http://www.facebook.com/blacktrashproductions