File move analysis for automv similarity. Was: [PATCH] automv: new experimental extension

Martijn Pieters mj at zopatista.com
Mon Feb 15 12:24:22 UTC 2016


On Sat, Feb 13, 2016 at 11:20 PM, Augie Fackler <raf at durin42.com> wrote:
> I think we should document this value as being something we reserve
> the right to refine later, after doing some history analysis. I
> suspect we can do some analysis over firefox/facebook/google/whatever
> histories and come up with a better value than 1.0 as a default.

I've already ran the analysis on the Python, Mozilla-Central and
mercurial repositories, as well as two Facebook repositories (or
rather, on a large number of commits as a full analysis would take too
long). I just have not yet gotten around to do much with the numbers
generated.

I've attached the script I've used for this, as well as a zip with the
raw results (JSON files). I recorded both the similarity for recorded
moves, as well as the maximum similarity for any added file against
removed files where no move was recorded (with a minimum of 50%), to
get a sense of how many files are affected when enabling automv at a
certain percentage.

The cumulative results for moves across all analysed revisions looks like this:

  0 94050
  1 93673
  2 93422
  3 93362
  4 93332
  5 93300
  6 93262
  7 93228
  8 93188
  9 93155
 10 93127
 11 93091
 12 93058
 13 93020
 14 92986
 15 92959
 16 92931
 17 92904
 18 92872
 19 92836
 20 92797
 21 92777
 22 92739
 23 92700
 24 92667
 25 92622
 26 92590
 27 92551
 28 92510
 29 92466
 30 92418
 31 92384
 32 92337
 33 92307
 34 92277
 35 92229
 36 92174
 37 92110
 38 92042
 39 91982
 40 91930
 41 91866
 42 91808
 43 91752
 44 91707
 45 91635
 46 91579
 47 91527
 48 91461
 49 91400
 50 91331
 51 91249
 52 91198
 53 91139
 54 91085
 55 90996
 56 90915
 57 90835
 58 90752
 59 90642
 60 90548
 61 90459
 62 90337
 63 90218
 64 90111
 65 90021
 66 89915
 67 89792
 68 89703
 69 89590
 70 89483
 71 89360
 72 89226
 73 89085
 74 88948
 75 88810
 76 88609
 77 88418
 78 88211
 79 87987
 80 87804
 81 87585
 82 87367
 83 87142
 84 86864
 85 86610
 86 86280
 87 85971
 88 85612
 89 85254
 90 84877
 91 84453
 92 83937
 93 83399
 94 82818
 95 82169
 96 81343
 97 80248
 98 79135
 99 77826
100 76190

and for 'missing' moves the result is

 50 23791
 51 23638
 52 23341
 53 23056
 54 22576
 55 22305
 56 22027
 57 21797
 58 21574
 59 21343
 60 21116
 61 20880
 62 20677
 63 20415
 64 20174
 65 19958
 66 19307
 67 18898
 68 18538
 69 18259
 70 18049
 71 17794
 72 17576
 73 17318
 74 17119
 75 16872
 76 16662
 77 16460
 78 16255
 79 16042
 80 15883
 81 15652
 82 15404
 83 15207
 84 15015
 85 14796
 86 14613
 87 14384
 88 14118
 89 13751
 90 13462
 91 13160
 92 12828
 93 12500
 94 12112
 95 11691
 96 11129
 97 10675
 98 10180
 99  9644
100  9059

There is no clear winning percentage here, although I suspect there is
an interesting ramp-up at 95% that may indicate there is a sweetspot
there.

Feel free to give a better analysis of the results.

-- 
Martijn Pieters
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collect_change_scores.py
Type: text/x-python-script
Size: 5555 bytes
Desc: not available
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160215/cafff4c2/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: move_percentages.zip
Type: application/zip
Size: 2387 bytes
Desc: not available
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160215/cafff4c2/attachment.zip>


More information about the Mercurial-devel mailing list