File move analysis for automv similarity. Was: [PATCH] automv: new experimental extension

Pierre-Yves David pierre-yves.david at ens-lyon.org
Mon Feb 15 07:55:43 EST 2016



On 02/15/2016 12:24 PM, Martijn Pieters wrote:
> On Sat, Feb 13, 2016 at 11:20 PM, Augie Fackler <raf at durin42.com> wrote:
>> I think we should document this value as being something we reserve
>> the right to refine later, after doing some history analysis. I
>> suspect we can do some analysis over firefox/facebook/google/whatever
>> histories and come up with a better value than 1.0 as a default.
>
> I've already ran the analysis on the Python, Mozilla-Central and
> mercurial repositories, as well as two Facebook repositories (or
> rather, on a large number of commits as a full analysis would take too
> long). I just have not yet gotten around to do much with the numbers
> generated.
>
> I've attached the script I've used for this, as well as a zip with the
> raw results (JSON files). I recorded both the similarity for recorded
> moves, as well as the maximum similarity for any added file against
> removed files where no move was recorded (with a minimum of 50%), to
> get a sense of how many files are affected when enabling automv at a
> certain percentage.
>
> The cumulative results for moves across all analysed revisions looks like this:
>
>    0 94050
>    1 93673
>    2 93422
>    3 93362
>    4 93332
>    5 93300
>    6 93262
>    7 93228
>    8 93188
>    9 93155
>   10 93127
>   11 93091
>   12 93058
>   13 93020
>   14 92986
>   15 92959
>   16 92931
>   17 92904
>   18 92872
>   19 92836
>   20 92797
>   21 92777
>   22 92739
>   23 92700
>   24 92667
>   25 92622
>   26 92590
>   27 92551
>   28 92510
>   29 92466
>   30 92418
>   31 92384
>   32 92337
>   33 92307
>   34 92277
>   35 92229
>   36 92174
>   37 92110
>   38 92042
>   39 91982
>   40 91930
>   41 91866
>   42 91808
>   43 91752
>   44 91707
>   45 91635
>   46 91579
>   47 91527
>   48 91461
>   49 91400
>   50 91331
>   51 91249
>   52 91198
>   53 91139
>   54 91085
>   55 90996
>   56 90915
>   57 90835
>   58 90752
>   59 90642
>   60 90548
>   61 90459
>   62 90337
>   63 90218
>   64 90111
>   65 90021
>   66 89915
>   67 89792
>   68 89703
>   69 89590
>   70 89483
>   71 89360
>   72 89226
>   73 89085
>   74 88948
>   75 88810
>   76 88609
>   77 88418
>   78 88211
>   79 87987
>   80 87804
>   81 87585
>   82 87367
>   83 87142
>   84 86864
>   85 86610
>   86 86280
>   87 85971
>   88 85612
>   89 85254
>   90 84877
>   91 84453
>   92 83937
>   93 83399
>   94 82818
>   95 82169
>   96 81343
>   97 80248
>   98 79135
>   99 77826
> 100 76190
>
> and for 'missing' moves the result is
>
>   50 23791
>   51 23638
>   52 23341
>   53 23056
>   54 22576
>   55 22305
>   56 22027
>   57 21797
>   58 21574
>   59 21343
>   60 21116
>   61 20880
>   62 20677
>   63 20415
>   64 20174
>   65 19958
>   66 19307
>   67 18898
>   68 18538
>   69 18259
>   70 18049
>   71 17794
>   72 17576
>   73 17318
>   74 17119
>   75 16872
>   76 16662
>   77 16460
>   78 16255
>   79 16042
>   80 15883
>   81 15652
>   82 15404
>   83 15207
>   84 15015
>   85 14796
>   86 14613
>   87 14384
>   88 14118
>   89 13751
>   90 13462
>   91 13160
>   92 12828
>   93 12500
>   94 12112
>   95 11691
>   96 11129
>   97 10675
>   98 10180
>   99  9644
> 100  9059
>
> There is no clear winning percentage here, although I suspect there is
> an interesting ramp-up at 95% that may indicate there is a sweetspot
> there.
>
> Feel free to give a better analysis of the results.

Would it be possible to get a version where the cummulative coverage is 
provided in % ?

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list