PATCH improve cmdutil.findrenames to use threshold

Alil Adamov diagiman at gmail.com
Fri Mar 19 02:04:52 CDT 2010


2010/3/18 Benoit Boissinot <bboissin at gmail.com>

> On Thu, Mar 18, 2010 at 4:19 PM, Alil Adamov <diagiman at gmail.com> wrote:
> > This activates threshold use in findrenames
> > it is used in rename guess dialog of TortoiseHg
>
> Could you add more information on why you need it?
> To me it looks like it makes hg less efficient, and it's probably
> buggy (I don't know how we handle duplicate entries).
>
> regards,
>
> Benoit
> >
> > --- a/mercurial/cmdutil.py
> > +++ b/mercurial/cmdutil.py
> > @@ -287,7 +287,7 @@
> >
> >  def findrenames(repo, added, removed, threshold):
> >      '''find renamed files -- yields (before, after, score) tuples'''
> > -    copies = {}
> > +    copies = []
> >      ctx = repo['.']
> >      for i, r in enumerate(removed):
> >          repo.ui.progress(_('searching'), i, total=len(removed))
> > @@ -321,5 +321,4 @@
> >              return equal * 2.0 / lengths
> >
> >          for a in added:
> > -            bestscore = copies.get(a, (None, threshold))[1]
> >              myscore = score(repo.wread(a))
> > @@ -325,5 +324,5 @@
> >              myscore = score(repo.wread(a))
> > -            if myscore >= bestscore:
> > -                copies[a] = (r, myscore)
> > +            if myscore >= threshold:
> > +                copies += [(a, r, myscore)]
> >      repo.ui.progress(_('searching'), None)
> >
> > @@ -328,7 +327,6 @@
> >      repo.ui.progress(_('searching'), None)
> >
> > -    for dest, v in copies.iteritems():
> > -        source, score = v
> > +    for dest, source, score in copies:
> >          yield source, dest, score
> >
> >  def addremove(repo, pats=[], opts={}, dry_run=None, similarity=None):
> >
>

It is used in TortoiseHg guess dialog which currently puts only single match
pairs
for each file. Simularity is not used at all.
So user gets only max score pair regardless of simularity and presense of
other candidates.

cmdutil.findrenames has threshold param which is not used
But TortoiseHg and Mercurial's addremove (as I see the only place) assumes
as it is used.

If the original design was to not allow duplicates then:

--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -323,8 +323,8 @@
         for a in added:
             bestscore = copies.get(a, (None, threshold))[1]
             myscore = score(repo.wread(a))
-            if myscore >= bestscore:
-                copies[a] = (r, myscore)
+            if myscore >= threshold and myscore >= bestscore:
+                copies += [(a, r, myscore)]
     repo.ui.progress(_('searching'), None)

     for dest, v in copies.iteritems():

and TortoiseHg can create its own findrenames.
But IMO it is "core" functinality (may be use optional param).
It would be better to keep meaning of Mercurial and TortoiseHg similarity
the same.

-- 
Regards,
Alil Adamov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20100319/9ee244f1/attachment.htm>


More information about the Mercurial-devel mailing list