size of repository with many branches, vs. git
Dov Feldstern
dfeldstern at fastimap.com
Sun Mar 30 14:23:01 CDT 2008
Matt Mackall wrote:
> On Sat, 2008-03-29 at 23:59 +0300, Dov Feldstern wrote:
>> The conversion went well (with some help from Patrick), but the
>> result
>> was disappointing to me: the size of the cloned repository is between
>> ~700MB (with no --datesort, converted in chunks of 1000 revisions at a
>> time) to ~1GB (with --datesort, which probably better reflects what
>> would happen over time as the project is tracked in real-time from svn).
>> By comparison, the entire git repository (freshly cloned) is only ~200MB!
>
> Mercurial compression is suboptimal in the following ways:
>
> - every working directory file in the history has a backing repository
> file so the typical repository will grow by (filesystem block size *
> number of files in history)/2
Ah, I guess that would explain why on two different machines (the one on
which conversion took place, and my local clone) the repositories vary
quite a bit in size (~1GB vs. ~1.4GB)? Both are ext3, but could it be
that they have different block sizes?
> - copies and renames store a full new revision at the target
> - revlog storage is linear so interleaving of branches in a single
> revlog reduces compression
>
I assume that this last point is what causes a lot of the trouble in my
case --- I guessed that something like that must be going on, when I saw
the difference between the datesort-ed and the non-datesort-ed repos.
And in LyX, we normally have two main branches (trunk and the latest
stable release), both of which are committed-to quite often (multiple
times a day), and the development-cycle of a release lasts about a year
or more, meaning the branch diverges from the trunk quite a bit over
this time period... not to mention some users who have personal branches...
> The last problem mostly appears in the manifest as it gets touched by
> every commit on every branch. How many files are in your working dir,
> how many files are in your store, how many changesets do you have, and
> how big is your 00manifest.i?
>
Here are the numbers for my converted repository:
*) working directory (actually, the number of files in the output of 'hg
manifest' on tip): 3281
*) # of files in store: 84215 ('find .hg/store/data | wc -l')
*) # of changesets: 21123
*) size of 00manifest.i: 1351744 bytes
*) size of 00manifest.d: 415874316 bytes! (i.e., about 35-40% of the
repository size is in this single file...)
I'd be happy to provide a bundle of the converted repository for
inspection, if anyone can provide me space to which I can upload a
450MB+ file.
So, are there any thoughts on improving any of these issues? Or am I
trying to use mercurial in a way that is sufficiently different than the
way it's intended to be used (storing many branches in a single
repository, tracking a foreign repository with these branches very
closely, ...)?
Thanks!
Dov
More information about the Mercurial
mailing list