[PATCH 2 of 5] findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes
Mads Kiilerich
mads at kiilerich.com
Sun Mar 21 15:21:17 CDT 2010
David Greenaway wrote, On 03/21/2010 08:33 AM:
> # HG changeset patch
> # User David Greenaway<hg-dev at davidgreenaway.com>
> # Date 1269151891 -39600
> # Node ID 153c602f7b20c95484033d2b91d05171f62f91b2
> # Parent 8bf2c8f1b35705f4f5b615aaf954a3cb034c5dc0
> findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
>
...
> +def findrenames(repo, added, removed, threshold):
> + '''find renamed files -- yields (before, after, score) tuples'''
> + commitedctx = repo['.']
> + workingctx = repo[None]
>
> + # Fetch contexts for added and removed files.
> + addedfiles = [workingctx[fp] for fp in added]
> + removedfiles = [commitedctx[fp] for fp in removed if fp in commitedctx]
> +
> + #
> + # Zero length files will be frequently unrelated to each other, and
> + # tracking the deletion/addition of such a file will probably cause more
> + # harm than good. We strip them out here to avoid matching them later on.
> + #
>
(The Mercurial style seems to be compact code without blank lines -
especially in comments.)
> + addedfiles = [x for x in addedfiles if x.size()> 0]
> + removedfiles = [x for x in removedfiles if x.size()> 0]
>
Why not do this while building the "sets" in the first place?
> + # Find exact matches.
> + handledfiles = set()
> + for (a, b) in _findexactmatches(repo, addedfiles, removedfiles):
> + handledfiles.add(b)
> + yield (a, b, 1.0)
>
Shouldn't "a" be "removed" from removedfiles as well?
Perhaps it would be simpler to let addedfiles and removedfiles be sets
from the beginning and just remove handled files here. (The sets could
perhaps be converted to (non-mutated) lists in the call to
_findexactmatches.
> + # If the user requested similar files to be matched, search for
> + # them also.
> + if threshold< 1.0:
> + # Remove files already discovered.
> + addedfiles = [x for x in addedfiles if x.path() not in handledfiles]
> +
> + # Find partial matches.
> + for (a, b, score) in _findsimilarmatches(repo,
> + addedfiles, removedfiles, threshold):
> + yield (a, b, score)
>
/Mads
More information about the Mercurial-devel
mailing list