[PATCH] avoid some false positives for addremove -s

Alexis S. L. Carvalho alexis at cecm.usp.br
Thu Feb 15 02:31:09 CST 2007


Thus spake Erling Ellingsen:
> # HG changeset patch
> # User Erling Ellingsen <erlingalf at gmail.com>
> # Date 1171497082 -3600
> # Node ID 475fc420289b50577d59a293e7e24586d0c81ec4
> # Parent  5b1f663ef86d68ce11d70de8e5ab61d93341a18c
> avoid some false positives for addremove -s

This looks mostly ok, but gmail has the annoying tendency to mangle
inline patches.  Can you send it again as an attachment?

Some minor things:

> diff -r 5b1f663ef86d -r 475fc420289b mercurial/cmdutil.py
> --- a/mercurial/cmdutil.py	Tue Feb 06 16:12:22 2007 -0600
> +++ b/mercurial/cmdutil.py	Thu Feb 15 00:51:22 2007 +0100
> @@ -146,20 +145,31 @@ def walk(repo, pats=[], opts={}, node=No
>         yield src, fn, util.pathto(repo.getcwd(), fn), fn in exact
> 
> def findrenames(repo, added=None, removed=None, threshold=0.5):
> +    '''find renamed files -- yields (before, after, score) tuples'''
>     if added is None or removed is None:
>         added, removed = repo.status()[1:3]
> +    import bdiff

Put the import at the top of the file, with the other imports

>     ctx = repo.changectx()
>     for a in added:
>         aa = repo.wread(a)
> -        bestscore, bestname = None, None
> +        bestname, bestscore = None, threshold

>         for r in removed:
>             rr = ctx.filectx(r).data()
> -            delta = mdiff.textdiff(aa, rr)
> -            if len(delta) < len(aa):
> -                myscore = 1.0 - (float(len(delta)) / len(aa))
> -                if bestscore is None or myscore > bestscore:
> -                    bestscore, bestname = myscore, r
> -        if bestname and bestscore >= threshold:
> +
> +            # bdiff.blocks() returns blocks of matching lines
> +            # count the number of bytes in each
> +            equal = 0
> +            alines = mdiff.splitnewlines(aa)
> +            matches = bdiff.blocks(aa, rr)[:-1]
> +            for x1,x2,y1,y2 in matches:
> +                assert x2-x1 == y2-y1

This assert shouldn't be needed.  Well, asserts obviously should never
trigger, but if we really want that check, there are probably better
places for it.

> +                for line in alines[x1:x2]:
> +                    equal += len(line)
> +
> +            myscore = equal*2.0 / (len(aa)+len(rr))
> +            if myscore >= bestscore:
> +                bestscore, bestname = myscore, r

You used "bestname, bestscore" a few lines above and "bestscore,
bestname" here.  Can you use the same order in both?

> +        if bestname:
>             yield bestname, a, bestscore
> 
> def addremove(repo, pats=[], opts={}, wlock=None, dry_run=None,
> diff -r 5b1f663ef86d -r 475fc420289b tests/test-addremove-similar
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/tests/test-addremove-similar	Thu Feb 15 00:51:22 2007 +0100

> diff -r 5b1f663ef86d -r 475fc420289b tests/test-addremove-similar-2
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/tests/test-addremove-similar-2	Thu Feb 15 00:51:22 2007 +0100

I'd rather have both tests in the same test-addremove-similar file...

Thanks

Alexis


More information about the Mercurial-devel mailing list