[PATCH 1 of 5] findrenames: Separate repository access commands from similarity algorithm
Benoit Boissinot
benoit.boissinot at ens-lyon.org
Sun Mar 7 11:44:09 CST 2010
On Sun, Mar 07, 2010 at 04:12:48AM -0000, David Greenaway wrote:
> # HG changeset patch
> # User David Greenaway <hg-dev at davidgreenaway.com>
> # Date 1267934964 -39600
> # Node ID 10649eca0e852b7f229e392f36812bbd6f89773c
> # Parent 033d2fdc3b9d3e33fd33d45109aafdb4a5cb3273
> findrenames: Separate repository access commands from similarity algorithm.
>
> The current 'findrenames' function mixes concerns of retrieving data from the
> repository with actually computing similarity between old and new files.
> This patch splits out data retrieval back into addremove(), leaving the
> pure similarity detection algorithm in findrenames().
I'm not sure this is the way to go, if you want to separate out the
similarity algorithm, just create a new function (maybe in context.py?)
> Upcoming changes will increase the complexity of findrenames(), making these
> changes desirable. Additionally, separating the two allows findrenames() to be
> used from callers in other contexts in the future.
I really think you should not abstract data retrieval this way, the call
should have contexts anyway.
cheers,
Benoit
>
> diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
> --- a/mercurial/cmdutil.py
> +++ b/mercurial/cmdutil.py
> @@ -285,23 +285,26 @@
> def matchfiles(repo, files):
> return _match.exact(repo.root, repo.getcwd(), files)
>
> -def findrenames(repo, added, removed, threshold):
> - '''find renamed files -- yields (before, after, score) tuples'''
> +def findrenames(added, removed, threshold):
> + """
> + Given two lists of files, yield (source, destination, score) tuples of
> + similar files.
> +
> + The input 'added' and 'removed' lists should be lists of tuples containing
> + (filename, function to retrieve file data). The retrieval functions will
> + be given a single argument: the name of the file to retrieve.
> + """
> copies = {}
> - ctx = repo['.']
> - for r in removed:
maybe just pass filectx in added/removed
> - if r not in ctx:
> - continue
> - fctx = ctx.filectx(r)
> + for (r, r_data) in removed:
> + orig = r_data(r)
>
> def score(text):
> if not len(text):
> return 0.0
> - if not fctx.cmp(text):
> + if orig == text:
then you can keep the optimized version here
> return 1.0
> if threshold == 1.0:
> return 0.0
> - orig = fctx.data()
and lazily load the text
--
:wq
More information about the Mercurial-devel
mailing list