[RFC] kbfiles: an extension to track binary files with less wasted bandwidth

Martin Geisler mg at lazybytes.net
Thu Sep 22 11:37:48 CDT 2011


"Na'Tosha Bard" <natosha at unity3d.com> writes:

> So, to pick this topic up again, can we get an open punchlist of
> things that the mercurial community (and project leader) believes is
> "missing" for the largefiles extension? E.g, what is missing for it to
> be accepted into mercurial?

I guess you'll have to patchbomb it here eventually. Also, you could
describe the features in a mail here -- I found a usage.txt file in the
repository which seems relevant:

  Largefiles allows for tracking large, incompressible binary files in
  Mercurial without requiring excessive bandwidth for clones and pulls.
  Files added as largefiles are not tracked directly by Mercurial;
  rather, their revisions are identified by a checksum, and Mercurial
  tracks these checksums. This way, when you clone a repository or pull
  in changesets, the large files in older revisions of the repository
  are not needed, and only the ones needed to update to the current
  version are downloaded. This saves both disk space and bandwidth.

  If you are starting a new repository or adding new large binary files,
  using largefiles for them is as easy as adding '--large' to your hg
  add command. For example:

  $ dd if=/dev/urandom of=thisfileislarge count=2000
  $ hg add --large thisfileislarge
  $ hg commit -m 'add thisfileislarge, which is large, as a largefile'

  When you push a changeset that affects largefiles to a remote
  repository, its largefile revisions will be uploaded along with it.
  Note that the remote Mercurial must also have the largefiles extension
  enabled for this to work.

  When you pull a changeset that affects largefiles from a remote
  repository, nothing different from Mercurial's normal behavior
  happens. However, when you update to such a revision, any largefiles
  needed by that revision are downloaded and cached if they have never
  been downloaded before. This means that network access is required to
  update to revision you have not yet updated to.

  If you already have large files tracked by Mercurial without the
  largefiles extension, you will need to convert your repository in
  order to benefit from largefiles. This is done with the 'hg lfconvert'
  command:

  $ hg lfconvert --size 10 oldrepo newrepo

  By default, in repositories that already have largefiles in them, any
  new file over 10MB will automatically be added as largefiles. To
  change this threshhold, set [largefiles].size in your Mercurial config
  file to the minimum size in megabytes to track as a largefile, or use
  the --lfsize option to the add command (also in megabytes):

  [largefiles]
  size = 2

  $ hg add --lfsize 2

  The [largefiles].patterns config option allows you to specify specific
  space-separated filename patterns (in shell glob syntax) that should
  always be tracked as largefiles:

  [largefiles]
  pattens = *.jpg *.{png,bmp} library.zip content/audio/*

I tried cloning the largefiles repo into the hgext folder in Mercurial
and ran

  % pyflakes hgext/largefiles/*.py
  hgext/largefiles/basestore.py:15: 'shutil' imported but unused
  hgext/largefiles/basestore.py:17: 'error' imported but unused
  hgext/largefiles/basestore.py:17: 'url_' imported but unused
  hgext/largefiles/lfutil.py:39: redefinition of function 'dirstate_walk' from line 35
  hgext/largefiles/localstore.py:57: undefined name 'err'
  hgext/largefiles/overrides.py:13: 're' imported but unused
  hgext/largefiles/overrides.py:28: 'proto' imported but unused
  hgext/largefiles/overrides.py:611: local variable 'dest' is assigned to but never used
  hgext/largefiles/overrides.py:662: redefinition of function 'write' from line 647
  hgext/largefiles/proto.py:7: 'shutil' imported but unused
  hgext/largefiles/proto.py:109: undefined name 'l'
  hgext/largefiles/proto.py:126: undefined name 'capabilities_orig'
  hgext/largefiles/proto.py:155: undefined name 'ssh_oldcallstream'
  hgext/largefiles/proto.py:162: undefined name 'http_oldcallstream'
  hgext/largefiles/remotestore.py:57: undefined name 'HTTPError'
  hgext/largefiles/remotestore.py:61: undefined name 'urllib2'
  hgext/largefiles/remotestore.py:86: local variable 'expect_hash' is assigned to but never used
  hgext/largefiles/remotestore.py:95: undefined name 'store_path'
  hgext/largefiles/remotestore.py:100: undefined name 'store_path'
  hgext/largefiles/reposetup.py:15: 'httprepo' imported but unused
  hgext/largefiles/reposetup.py:34: undefined name '_'
  hgext/largefiles/reposetup.py:224: redefinition of unused 'node' from line 15

You should look into those errors.

> The main repository is living here:
> https://developers.kilnhg.com/Repo/Kiln/largefiles/largefiles
>
> (there's also a branch with some compatibility stuff that's useful for
> Kiln users, but that is not so relevant here).
>
> Cheers,
> Na'Tosha

-- 
Martin Geisler

Mercurial links: http://mercurial.ch/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20110922/a8592f02/attachment.pgp>


More information about the Mercurial-devel mailing list