size of repository with many branches, vs. git

Dov Feldstern dfeldstern at fastimap.com
Sun Mar 30 14:23:01 CDT 2008


Matt Mackall wrote:
> On Sat, 2008-03-29 at 23:59 +0300, Dov Feldstern wrote:
>> The conversion went well (with some help from Patrick), but the
>> result 
>> was disappointing to me: the size of the cloned repository is between 
>> ~700MB (with no --datesort, converted in chunks of 1000 revisions at a 
>> time) to ~1GB (with --datesort, which probably better reflects what 
>> would happen over time as the project is tracked in real-time from svn). 
>> By comparison, the entire git repository (freshly cloned) is only ~200MB!
> 
> Mercurial compression is suboptimal in the following ways:
> 
> - every working directory file in the history has a backing repository
> file so the typical repository will grow by (filesystem block size *
> number of files in history)/2

Ah, I guess that would explain why on two different machines (the one on 
which conversion took place, and my local clone) the repositories vary 
quite a bit in size (~1GB vs. ~1.4GB)? Both are ext3, but could it be 
that they have different block sizes?

> - copies and renames store a full new revision at the target
> - revlog storage is linear so interleaving of branches in a single
> revlog reduces compression
> 

I assume that this last point is what causes a lot of the trouble in my 
case --- I guessed that something like that must be going on, when I saw 
the difference between the datesort-ed and the non-datesort-ed repos. 
And in LyX, we normally have two main branches (trunk and the latest 
stable release), both of which are committed-to quite often (multiple 
times a day), and the development-cycle of a release lasts about a year 
or more, meaning the branch diverges from the trunk quite a bit over 
this time period... not to mention some users who have personal branches...

> The last problem mostly appears in the manifest as it gets touched by
> every commit on every branch. How many files are in your working dir,
> how many files are in your store, how many changesets do you have, and
> how big is your 00manifest.i?
> 

Here are the numbers for my converted repository:
*) working directory (actually, the number of files in the output of 'hg 
manifest' on tip): 3281
*) # of files in store: 84215 ('find .hg/store/data | wc -l')
*) # of changesets: 21123
*) size of 00manifest.i: 1351744 bytes
*) size of 00manifest.d: 415874316 bytes! (i.e., about 35-40% of the 
repository size is in this single file...)

I'd be happy to provide a bundle of the converted repository for 
inspection, if anyone can provide me space to which I can upload a 
450MB+ file.

So, are there any thoughts on improving any of these issues? Or am I 
trying to use mercurial in a way that is sufficiently different than the 
way it's intended to be used (storing many branches in a single 
repository, tracking a foreign repository with these branches very 
closely, ...)?

Thanks!
Dov


More information about the Mercurial mailing list