size of repository with many branches, vs. git
Dov Feldstern
dfeldstern at fastimap.com
Mon Mar 31 16:46:44 CDT 2008
(in which mercurial is exonerated, so read on!)
So yesterday I discovered that part of the problem was that this one
branch --- personal --- actually contains subdirectories with lots of
branches in them; but was being converted as a single, very large
branch, which was obviously very different than the other branches, and
with multiple copies of each file in it.
So here's what I tried: first, I cloned each branch (excluding personal)
into a separate repository (with hg clone --rev), so I now had 23
repositories. Their sizes are in Appendix 1 (interestingly, the
repositories were apparently not hardlinked? --- which made life easier
getting the sizes with du). The total size of all the branches came out
to 914452 KB, which is still not great.
Next, I cloned the default branch into a new repository, and then pulled
all the other branches into it, one at a time. So I now had a single
repository of all the branches, but in which the revisions were
basically as linear as possible. The size of this repository on disk was
159608 KB, which is much much better!
Finally, to complete our tests, I converted this new repository into yet
another one, with --datesort. And here's where I was really surprised:
despite the datesorting, which causes quite a bit of interleaving
between branches, the size of the datesorted repository is only 172616 KB!
So it seems that the non-standard branches layout in the original svn
repository really was the culprit (recall, the size I started out with
was ~1GB (datesorted) ). It's really worth noting --- especially for
anyone converting to mercurial --- how sensitive mercurial is to changes
in the layout of the tree between revisions. I guess that also just
moving files from one place to another in one branch and not in another,
would cause similar issues? Interestingly, git seems to have dealt with
this well, even though it also converted personal as a single branch:
the git repository is only 157300 KB, and that includes personal
(although as a single branch). I guess this is where tracking content
vs. tracking files really makes a difference...
Now I'm still a bit stuck with LyX, because I don't have any way to keep
the converted repository up-to-date, without pulling personal in again.
It would be really great if the convert extension would allow me to
specify full paths within the original svn repository to each of the
branches I want to convert, and/or to specify only those branches which
I want to include or ignore. (Of course, this means that I would miss
new branches which may be added later on, until I'd update the
branch-path-map manually... but I don't see any way of doing this
automatically; unless the svn repo itself stores information about which
paths are actual copies of the trunk, as opposed to just directories
created manually?).
Finally, just in case you're still interested, I'm adding the
revlogstats output for both of the all-branches-in-one-repositories ---
the datesorted and the non-datesorted. it may still be worth pursuing
this approach...
Again, thanks everyone for your help, and I hope that this can supply
some constructive insights to others that may be having trouble with
repository size...
Dov
Appendix 1: size (on disk) of each branch in a separate repository
------------------------------------------------------------------
3884 LyX-Team
4104 string-switch
4268 pathswitch
4312 debugstream
4980 runlatex
7560 rae
9008 dialogbase
12396 lyx-1_1_5
15120 obsolete
17436 BRANCH_new_insets
17736 BRANCH_1_1_6
18440 BRANCH_MVC
20884 BRANCH_NATBIB
33288 BRANCH-1_2_X
33380 BRANCH_GUII
49464 BRANCH_NOUPDATE
67552 CoordBranch
72736 BRANCH_1_3_X
81316 BooktabBranch
86824 gtkdevel
87464 BRANCH_1_4_X
127348 BRANCH_1_5_X
134952 default
914452 total
Appendix 2: revlogstats for the non-datesorted, all-branches-in-one
repository:
-------------------------------------------------------------------
0 0 0
1000 664906 536408 80.67
2000 1215181 948095 78.02
3000 1557129 1239030 79.57
4000 1999908 1578821 78.94
5000 2420433 1920701 79.35
6000 2862426 2275816 79.51
7000 3427959 2780318 81.11
8000 3930118 3187958 81.12
9000 4281668 3430958 80.13
10000 4671308 3699924 79.21
11000 4953131 3981747 80.39
12000 5221487 4141631 79.32
13000 5641740 4451862 78.91
14000 5835105 4645227 79.61
15000 6199522 4897644 79.00
16000 6434720 5132842 79.77
17000 6714012 5307982 79.06
18000 7028967 5484834 78.03
19000 8010598 5923646 73.95
19129 8268318 6069547 73.41
Appendix 3: revlogstats for the datesorted, all-branches-in-one repository:
-------------------------------------------------------------------
0 0 0
1000 1396350 609093 43.62
2000 5239618 1139758 21.75
3000 5633760 1433992 25.45
4000 7001082 1752538 25.03
5000 11773611 2102094 17.85
6000 12207434 2453218 20.10
7000 12713697 2806189 22.07
8000 13400747 3337689 24.91
9000 14123493 3703632 26.22
10000 14541697 3952737 27.18
11000 14936154 4215431 28.22
12000 15243602 4471724 29.34
13000 15515288 4635210 29.88
14000 15907699 4920107 30.93
15000 16117844 5130252 31.83
16000 16476146 5376603 32.63
17000 16691398 5591855 33.50
18000 17026828 5781577 33.96
19000 17433378 5969085 34.24
19130 18317979 5991407 32.71
More information about the Mercurial
mailing list