Potential BC break on largefiles/lfs

Matt Harbison mharbison72 at gmail.com
Wed Sep 19 23:28:28 EDT 2018


On Wed, 19 Sep 2018 21:41:20 -0400, Gregory Szorc  
<gregory.szorc at gmail.com> wrote:

> Currently, the largefiles and lfs extensions monkeypatch various  
> functions
> so that when they encounter repository data indicative of largefiles or
> lfs, they automagically add a repository requirement for largefiles/lfs.
>
> This behavior is arguably user-friendly and therefore good.
>
> At the same time, it is also a bit surprising. Repository requirements  
> (the
> .hg/requires file) is ideally read-only after repository creation. This
> ensures that we are always able to open/read a repository with the
> Mercurial client that created it. Silently adding requirements when
> interacting with remote servers or performing a local commit that
> introduces a well-named file undermines that guarantee.

I'm not sure how important always being able to read a repo with the  
original client is, outside of Mercurial development.  We've only got a  
small group at work (~30 developers), but we don't go backward.  And when  
we rolled out LFS, I *wanted* to force client upgrades so the extension  
would be available.  Maybe a large open source community is different in  
that regard, but they would seem less likely to use these extensions.

> Furthermore, I'm attempting to refactor how repository objects are
> constructed. Essentially I want to make the repository type dynamically
> derived from .hg/requires. i.e. instead of monkeypatching repo objects  
> post
> construction, we'll build a type that is custom tailored for that exact
> repository. Unfortunately, largefiles and lfs dynamically changing the
> requirements - and therefore behavior of a repository object -
> mid-operation (like an `hg pull`) undermines my "repo type is static for
> its lifetime" goals.
>
> I wanted to take a quick temperature check to see if we're comfortable
> incurring a BC break around this auto-add-requirements behavior.

I'm torn on this.  I don't want to hold up your refactoring, but the  
single most frustrating thing about my early work with LFS was getting  
cryptic errors when the extension wasn't loaded on one side or the other  
for various reasons.  We've gotten through a couple of cycles with it  
bundled, so maybe it's a better situation than late last year.

That said, the dynamic requirements update doesn't always work[1].  I  
submitted a patch earlier this year that allowed the extension to be  
implicitly loaded, which was rejected, but you offered alternative  
ideas[2].  I wasn't sure what to do with that, so I haven't gotten back to  
it.  But when we recently rolled out LFS at work, there were a couple of  
SNAFUs where the extension ended up not getting added to the config,  
causing the unknown flags error.  So getting it loaded on demand seems  
more important than dynamically updating requirements.  That still leaves  
old clients that don't have the extension bundled out in the cold if we  
fail to add the requirement.  (BTW, committing LFS works without adding  
any requirement, or did when I started.  So this is maybe only a  
push/pull/unbundle issue.  But if the extension is loaded on the server  
side too, that won't fail either.  So that just increases the chance for a  
client to be confused somewhere.)

I'm not a fan of needing to run an explicit command- any time an app says  
"run this", my reaction is "why didn't you just do that for me?"  (I  
realize it can be an elaborate "are you really sure? [y/n]" prompt, but  
with these extensions, it seems like the choice has already been made if  
it's actively used on either side.)  Also, what would the UX be for  
enabling it on say, a BitBucket repo?

> I'm proposing that instead of auto-adding requirements when data is
> encountered, we hard fail when incoming data references largefiles or  
> lfs.
> The error message would say something like "largefiles/lfs not enabled on
> this repository; enable largefiles/lfs via `hg debugenablelfs` and retry
> this operation. See <URL> for more info." This error would manifest:
>
> * During `hg pull`, immediately after retrieving the remote's  
> capabilities
> (before any data has been transferred).
> * During `hg unbundle`, when the first data referencing largefiles/lfs is
> encountered.
> * During `hg commit` if the committed data involves LFS.
>
> The most user hostile is `hg unbundle`, as we won't see an error until  
> some
> data has been transferred. So you could potentially transfer megabytes of
> data *then* get the error. The other errors should be ~immediate.
>
> When cloning from a repo with largefiles/lfs enabled, we would continue  
> to
> automagically set up the repo for largefiles/lfs from its onset. So this
> proposed change would only impact people who have existing clones then
> attempt to pull/unbundle/commit data that introduces largefiles/lfs. In
> other words, introducing largefiles/lfs into an existing repository would
> require all existing clones to take manual action to enable it.
>
> If we can't incur this BC break, it makes my alternate storage backend  
> work
> a little harder. But I think I can work around it.
>
> An alternative to the BC break would be supporting largefiles/lfs by
> default. i.e. the .hg/requires entry wouldn't be necessary to enable the
> feature. But this would entail moving basic functionality to core and/or
> raising an error when attempting to resolve largefiles/lfs data without  
> the
> extension loaded.

I'd be interested in supporting LFS by default, whenever I get around to  
finishing it.  But IDK if that would help the case of a repo being  
upgraded with a new client, and pushed to an old one that doesn't auto  
load the extension and/or minimal support.

I'm not sure if we can do that with largefiles, given how invasive its  
wrapping is.  There are some commands that output differently just by  
having the extension loaded, even with no actual largefiles.  (It's been  
awhile since I've used largefiles, so I don't recall an example.)

> Thoughts?

[1] https://bz.mercurial-scm.org/show_bug.cgi?id=5980
[2]  
https://www.mercurial-scm.org/pipermail/mercurial-devel/2018-April/115830.html


More information about the Mercurial-devel mailing list