File move analysis for automv similarity. Was: [PATCH] automv: new experimental extension
Martijn Pieters
mj at zopatista.com
Mon Feb 15 12:24:22 UTC 2016
On Sat, Feb 13, 2016 at 11:20 PM, Augie Fackler <raf at durin42.com> wrote:
> I think we should document this value as being something we reserve
> the right to refine later, after doing some history analysis. I
> suspect we can do some analysis over firefox/facebook/google/whatever
> histories and come up with a better value than 1.0 as a default.
I've already ran the analysis on the Python, Mozilla-Central and
mercurial repositories, as well as two Facebook repositories (or
rather, on a large number of commits as a full analysis would take too
long). I just have not yet gotten around to do much with the numbers
generated.
I've attached the script I've used for this, as well as a zip with the
raw results (JSON files). I recorded both the similarity for recorded
moves, as well as the maximum similarity for any added file against
removed files where no move was recorded (with a minimum of 50%), to
get a sense of how many files are affected when enabling automv at a
certain percentage.
The cumulative results for moves across all analysed revisions looks like this:
0 94050
1 93673
2 93422
3 93362
4 93332
5 93300
6 93262
7 93228
8 93188
9 93155
10 93127
11 93091
12 93058
13 93020
14 92986
15 92959
16 92931
17 92904
18 92872
19 92836
20 92797
21 92777
22 92739
23 92700
24 92667
25 92622
26 92590
27 92551
28 92510
29 92466
30 92418
31 92384
32 92337
33 92307
34 92277
35 92229
36 92174
37 92110
38 92042
39 91982
40 91930
41 91866
42 91808
43 91752
44 91707
45 91635
46 91579
47 91527
48 91461
49 91400
50 91331
51 91249
52 91198
53 91139
54 91085
55 90996
56 90915
57 90835
58 90752
59 90642
60 90548
61 90459
62 90337
63 90218
64 90111
65 90021
66 89915
67 89792
68 89703
69 89590
70 89483
71 89360
72 89226
73 89085
74 88948
75 88810
76 88609
77 88418
78 88211
79 87987
80 87804
81 87585
82 87367
83 87142
84 86864
85 86610
86 86280
87 85971
88 85612
89 85254
90 84877
91 84453
92 83937
93 83399
94 82818
95 82169
96 81343
97 80248
98 79135
99 77826
100 76190
and for 'missing' moves the result is
50 23791
51 23638
52 23341
53 23056
54 22576
55 22305
56 22027
57 21797
58 21574
59 21343
60 21116
61 20880
62 20677
63 20415
64 20174
65 19958
66 19307
67 18898
68 18538
69 18259
70 18049
71 17794
72 17576
73 17318
74 17119
75 16872
76 16662
77 16460
78 16255
79 16042
80 15883
81 15652
82 15404
83 15207
84 15015
85 14796
86 14613
87 14384
88 14118
89 13751
90 13462
91 13160
92 12828
93 12500
94 12112
95 11691
96 11129
97 10675
98 10180
99 9644
100 9059
There is no clear winning percentage here, although I suspect there is
an interesting ramp-up at 95% that may indicate there is a sweetspot
there.
Feel free to give a better analysis of the results.
--
Martijn Pieters
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collect_change_scores.py
Type: text/x-python-script
Size: 5555 bytes
Desc: not available
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160215/cafff4c2/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: move_percentages.zip
Type: application/zip
Size: 2387 bytes
Desc: not available
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160215/cafff4c2/attachment.zip>
More information about the Mercurial-devel
mailing list