LFS todo list

Matt Harbison mharbison72 at gmail.com
Sat Apr 21 02:51:28 UTC 2018


As promised, here's the braindump of ideas that I've been tracking for  
LFS.  It's not in any particular order, and likely incomplete.  Probably  
not everything is a good idea, or has BC implications that would delay  
removing the experimental label.  But Pulkit was wondering about the  
status, so it seems worth sharing in case anyone else has ideas, or wants  
to pitch in.


Support blob transfer with `hg serve`
   - Support paths.default:lfs = ... style paths
   - SSH -> https server inference
   - LFS-Authenticate header support in client and server(?)
   - Why is copying the Authentication header into the JSON necessary?
   - Add support for transfer quotas?
   - Teach the server to redirect transfers to an external server?  (Needs  
config
     and server side support for verify action.)
   - Check for local disk space before allowing upload?
   - Download should be able to send the file in chunks, without reading the
     whole thing into memory
   - Make sure the http codes used are appropriate
   - Support for resuming transfers

Handle server with extension loaded and client without more gracefully.
   - indygreg has an idea for autoloading the extension, when needed.
   - changegroup3 is still experimental, and not enabled by default.

Stop uploading blobs when pushing between local repos
   - Could probably hardlink directly to the other local repo's store
   - Support inferring lfs.url for local push/pull

Stop uploading blobs on strip/amend/histedit/etc.  (Keep `hg bundle`)

Keep corrupt files around in 'store/lfs/incoming' for forensics

Finish `hg convert` story
   - splice in .hglfs file for normal -> lfs?
   - argument to accept a rules file?
   - drop lfs.track config settings

Stop reading in entire file when passing through filelog interface
   - requires major replumbing to core
     - https://www.mercurial-scm.org/wiki/HandlingLargeFiles

Show to-be-applied rules with `hg files -r 'wdir()'`
   - debugignore can show file + line number, so a dedicated command could  
be
     useful too.

Filesets and templates (maybe don't need revset with 'file(set:...)')
   - oid and pointer on general keywords, IFF the file is a blob
   - drop existing items that would be redundant with general support

Add a flag that's visible in `hg files -v`?

Fix https multiplexing, and re-enable workers.

Output cleanup
   - Can we print the url when connecting to the blobstore?  (A sudden  
connection
     refused after pulling commits looks confusing.)  Problem is, 'pushing  
to
     main url' is printed, and then lfs wants to upload before going back  
to the
     main repo transfer, so then *that* could be confusing with extra  
output.

     - Can abort message hint about setting lfs.url, if it's not set?

   - Add more progress indicators?  Uploading a large repo looks idle for a  
long
     time while it scans for blobs in each outgoing revision.

   - Print filenames instead of hashes in error messages
     - subrepo aware paths, where necessary

   - Is existing output at the right status/note/debug level?

Add locks on cache and blob store

Teach client to handle lfs verify action.

Are proper file sizes reported in debugupgraderepo?

Garbage collection (issue5790)

Compressing stored blobs?
   - 700MB repo becomes 2.5GB with all lfs blobs
   - What implications for filesystem paths that don't indicate compression?

Prefetch files
   - {rawdata} template
   - verify
   - grep

Can verify be done without downloading everything?
   - If we know that we are talking to an hg server, we can leverage the  
fact
     that it validates in the Batch API portion, and skip d/l altogether.   
OTOH,
     maybe we should download the files unconditionally for forensics.  The
     alternative is to define a custom transfer handler that definitively
     verifies without transferring, and then cache those results.  When  
verify
     comes looking, look in the cache instead of actually opening the file  
and
     processing it.

   - Yuya has concerns about when blob fetch takes place vs when revlog is
     verified.  Since the visible hash matches the blob content, I don't  
think
     there's a way to verify the pointer file that's actually stored in the
     filelog (other than basic JSON checks).  Full verification requires the
     blob.

   - Opening a corrupt pointer file aborts.  It probably shouldn't for  
verify.

Can grep avoid downloading most things?

Is a command to download everything needed?

Export currently writes out the LFS blob.  Should it write the pointer  
instead?
   - diff is similar, and probably shouldn't see the pointer file

Is any hg-git work needed?

Remove lfs.retry hack in client?


More information about the Mercurial-devel mailing list