[PATCH] manifest: improve filesnotin performance by using lazymanifest diff

Tony Tung tonytung at instagram.com
Wed Apr 27 11:32:56 EDT 2016


Comments inline.

On Apr 27, 2016, at 6:20 AM, Martin von Zweigbergk <martinvonz at google.com<mailto:martinvonz at google.com>> wrote:



On Wed, Apr 27, 2016, 01:07 Sean Farley <sean at farley.io<mailto:sean at farley.io>> wrote:

Tony Tung <ttung at fb.com<mailto:ttung at fb.com>> writes:

> # HG changeset patch
> # User Tony Tung <tonytung at merly.org<mailto:tonytung at merly.org>>
> # Date 1461740718 25200
> #      Wed Apr 27 00:05:18 2016 -0700
> # Branch stable
> # Node ID 7f80dce78781f5fe691a23f1b7f5a110ed170f32
> # Parent  97811ff7964710d32cae951df1da8019b46151a2
> manifest: improve filesnotin performance by using lazymanifest diff
>
> lazymanifests can compute diffs significantly faster than taking the set
> of two manifests and calculating the delta.

FYI, we're currently in a feature freeze:

https://www.mercurial-scm.org/wiki/TimeBasedReleasePlan

Will resubmit.


> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -211,8 +211,10 @@
>
>      def filesnotin(self, m2):
>          '''Set of files in this manifest that are not in the other'''
> -        files = set(self)
> -        files.difference_update(m2)
> +        diff = self.diff(m2)
> +        files = set(filepath
> +                    for filepath, hashflags in diff.items()

iteritems() may be noticeably faster on large diffs

I was under the impression that items() had the same performance characteristics as iteritems(), but apparently, that’s only for python 3.  Will fix.


> +                    if hashflags[1][0] is None)

(for after May 1st) Would it be feasible to have a perf test for this
(and some sweet, sweet performance numbers in the commit message)?

As usual, I will request real-world perf numbers (instead or in addition). It would be nice to have them for both a good (small diff) and a bad case (large diff). Thanks.

hg diff -c . on Facebook’s large repo takes 1.5s instead of 2.1s with this change.  In the case of large diffs, I suspect the performance regression would be drowned out by the file system operations.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160427/e3d183ba/attachment.html>


More information about the Mercurial-devel mailing list