Mercurial as a database backend

Adam B cruxic at gmail.com
Mon Aug 13 11:10:21 CDT 2012


On Sat, Aug 11, 2012 at 5:08 AM, Arne Babenhauserheide <arne_bab at web.de> wrote:
> Hi Adam,
>
> Am Freitag, 10. August 2012, 08:50:20 schrieben Sie:
>> I have a question about Mercurial's internals.  I'm interested in using
>> Mercurial as a database backend for a peer-to-peer application.  One of the
>> main challenges I see with this is maintaining efficiency with hundreds of
>> thousands of records.  Logically each database "record" should be a
>> versioned file in the repository.  However, storing 100k+ files is a really
>> inefficient use of the disk, especially if the records are small.
>
> You could just write an extension or a wrapper, which retrieves the files from
> the backend store without using the working copy and updates the sqlite
> database with them.

Thanks for the pointer - I'll read up on what I can do with just an extension.

>
> But note that Mercurial stores the history in a store structured around files.
>
> Do you actually need to make each record one file? If you use an abstraction
> anyway (database), why not make it a bit more intelligent - for example taking
> the first 3 letters of the record identifier to bin the records into files and
> then sorting the records in the files by the identifier - as the b extension
> does it: http://mercurial.selenic.com/wiki/bExtension
>
> (for the file structure you need to minimize append-at-the-end and append-at-
> the-beginning to allow for easy merging)

In my mind, multiple records per file has some unattractive side
effects.  Firstly, updating one record would require inserting into
the middle of the file which means rewriting the trailing portion of
the file.  Fixed sized records would be an obvious solution but not a
good fit for my target (schema-less variable length data).  Secondly,
it seems that having multiple records per file would significantly
increase the need to merge changes from other nodes even when those
nodes didn't edit the same records.  And since merges must be
committed back, this would result in more revlog noise and network
activity to get all the nodes synchronized.

>
> Best wishes,
> Arne
> --
> 1w6 sie zu achten,
> sie alle zu finden,
> in Spiele zu leiten
> und sacht zu verbinden.
>http://1w6.org
>


More information about the Mercurial-devel mailing list