Converting big files

Greg Ward greg at gerg.ca
Sun Apr 11 15:32:20 CDT 2010


Hi folks --

a bit of a design debate has arisen with bfiles, and I want some
outside output.  Here's the scenario: you have a Mercurial repo that
currently tracks large binary files as regular files.  You want to
switch to using bfiles, i.e. get those large binary files out of
.hg/store and put them somewhere else.  (Quick summary: bfiles works
by tracking a "standin file", .hgbfiles/<bigfile>, for each <bigfile>.
 That's a 41-byte file containing the SHA-1 hash of the big file's
content plus a newline.  The actual big files live on a central store
somewhere.  The central store can be a filesystem path
(local/NFS/SMB/whatever), an HTTP URL, or an SSH URL.  It's structured
like <bigfile>/<hash>, i.e. every big file is a directory containing
multiple revisions, whose filenames are just the SHA-1 hash of the
contents.)

So, we need a tool to convert an existing repo with some big files in
it to a new repo where those big files are replaced by standin files
plus a new central store containing the actual big files.

Idea #1: bfiles should wrap/extend 'hg convert' and provide a way to
specify what are the big files.
pro:
  - use existing convert machinery, so less to reinvent
  - could probably work for conversions from svn etc. as well as hg->hg
con:
  - extending an extension feels fragile -- presumably even less
stable API than core hg

Idea #2: implement something separate from 'hg convert'
  2a: new command implemented by bfiles extension (bfconvert?)
  2b: new extension entirely
  2c: new standalone script
pro:
  - unaffected by API changes in hgext.convert (albeit still subject
to the whims of hg's and bfiles' APIs)
con:
  - risk of reinventing wheels that already exist in hgext.convert
  - might only work for hg -> hg conversions

Am I missing any pros or cons for these two ideas?  Or is there
another way to implement this that I have not thought of?  Other
thoughts, opinions, ideas?

Thanks --

Greg


More information about the Mercurial-devel mailing list