[PATCH] automv: use 95 as the default similarity threshold

Martijn Pieters mj at zopatista.com
Tue Feb 16 16:20:34 UTC 2016


# HG changeset patch
# User Martijn Pieters <mjpieters at fb.com>
# Date 1455638312 0
#      Tue Feb 16 15:58:32 2016 +0000
# Node ID 8a113a93013288f1aacb0016ec93e61b8ff3ed08
# Parent  c2e526103b29a5091cf139da302b99ee674957f2
automv: use 95 as the default similarity threshold.

The motivation for the change from 100 to 95 is included in a comment.

* Updated the tests to include a change to a moved file that still should be
  caught as a move.

* Use ui.configint() to non-integer configuration entries more gracefully. Also
  complain if a similarity outside of the acceptable range is set.

diff --git a/hgext/automv.py b/hgext/automv.py
--- a/hgext/automv.py
+++ b/hgext/automv.py
@@ -11,14 +11,25 @@
 
 The threshold at which a file is considered a move can be set with the
 ``automv.similarity`` config option. This option takes a percentage between 0
-(disabled) and 100 (files must be identical), the default is 100.
+(disabled) and 100 (files must be identical), the default is 95.
 
 """
+
+# Using 95 as a default similarity is based on an analysis of the mercurial
+# repositories of the cpython, mozilla-central & mercurial repositories, as
+# well as 2 very large facebook repositories. At 95 50% of all potential
+# missed moves would be caught, as well as correspond with 87% of all
+# explicitly marked moves.  Together, 80% of moved files are 95% similar or
+# more.
+#
+# See http://markmail.org/thread/5pxnljesvufvom57 for context.
+
 from __future__ import absolute_import
 
 from mercurial import (
     commands,
     copies,
+    error,
     extensions,
     scmutil,
     similar
@@ -37,7 +48,9 @@
     renames = None
     disabled = opts.pop('no_automv', False)
     if not disabled:
-        threshold = float(ui.config('automv', 'similarity', '100'))
+        threshold = ui.configint('automv', 'similarity', 95)
+        if not 0 <= threshold <= 100:
+            raise error.Abort(_('automv.similarity must be between 0 and 100'))
         if threshold > 0:
             match = scmutil.match(repo[None], pats, opts)
             added, removed = _interestingfiles(repo, match)
diff --git a/tests/test-automv.t b/tests/test-automv.t
--- a/tests/test-automv.t
+++ b/tests/test-automv.t
@@ -13,7 +13,7 @@
 
 Test automv command for commit
 
-  $ echo 'foo' > a.txt
+  $ printf 'foo\nbar\nbaz\n' > a.txt
   $ hg add a.txt
   $ hg commit -m 'init repo with a'
 
@@ -37,6 +37,24 @@
   $ mv a.txt b.txt
   $ hg rm a.txt
   $ hg add b.txt
+  $ printf '\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit -m 'msg'
+  detected move of 1 files
+  created new head
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
   $ printf '\nfoo\n' >> b.txt
   $ hg status -C
   A b.txt
@@ -161,6 +179,29 @@
   $ mv a.txt b.txt
   $ hg rm a.txt
   $ hg add b.txt
+  $ printf '\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --amend -m 'amended'
+  detected move of 1 files
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  A c.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
   $ printf '\nfoo\n' >> b.txt
   $ hg status -C
   A b.txt
@@ -285,3 +326,13 @@
   $ hg status --change . -C
   A b.txt
   R a.txt
+
+error conditions
+
+  $ cat >> $HGRCPATH << EOF
+  > [automv]
+  > similarity=110
+  > EOF
+  $ hg commit -m 'revision to amend to'
+  abort: automv.similarity must be between 0 and 100
+  [255]


More information about the Mercurial-devel mailing list