AW: Performance with binary-heavy repositories

Christoph.Spiel at partner.bmw.de Christoph.Spiel at partner.bmw.de
Fri Aug 3 05:07:21 CDT 2007


Bryan -

> Could you give me an idea of the
> sizes of your files, please?

I give you even more. ;)  Here comes a histogram of sizes.

   Size/Bytes   Occurrencies
   ==========   ============
        5977        1355
       46882         108
       84918          42
      107234          18
      144558          13
      196372           6
      245490           2
      256062           3
      320656           3
      450022           1
      737280           2
      975360           1
     1167872           1
     1624576           1
     2211809           1
     2694375           1
     5317610           1
     5505148           1
    12460544           1
    14458072           1
    24047618           1
    27227648           1

This means, for example, that we have 1355 binary files with a median
size of 5,977 bytes and one file with a size of 27,227,648 bytes.

To help you a bit more, I collected the histogram of the
"line lengths" of the binary files.

    Line Length    Occurrencies
    ===========    ============
           21         1288997
          160            6666
          311             304
          445             182
          572             113
          701             112
          824              82
          953              66
         1086              45
         1236              61
         1345              26
         1506              30
         1579              17
         1782              15
         1852              25
         2045               6
         2152              20
         2355               7
         2585               6
         2644               5
         2830               3
         3079               1
         3315               2
         3878               2
         4012              17
         4698               1
         6068               1
         6947               1

It looks like the assumption "most lines are shorter than 100
characters" is pretty good even when applied to _our_ binary data.

Regards,
        Chris

PS: I'll be out of office until 2007-8-20.

--
Dr. Christoph L. Spiel
BMW Forschungs- und Innovationszentrum, EA-410
Lauchstaedterstrasse 5, 80995 Muenchen



More information about the Mercurial mailing list