Potential BC break on largefiles/lfs

Thu Sep 20 11:53:49 EDT 2018

On Wed, Sep 19, 2018 at 8:28 PM Matt Harbison <mharbison72 at gmail.com> wrote:

> On Wed, 19 Sep 2018 21:41:20 -0400, Gregory Szorc
> <gregory.szorc at gmail.com> wrote:
>
> > Currently, the largefiles and lfs extensions monkeypatch various
> > functions
> > so that when they encounter repository data indicative of largefiles or
> > lfs, they automagically add a repository requirement for largefiles/lfs.
> >
> > This behavior is arguably user-friendly and therefore good.
> >
> > At the same time, it is also a bit surprising. Repository requirements
> > (the
> > .hg/requires file) is ideally read-only after repository creation. This
> > ensures that we are always able to open/read a repository with the
> > Mercurial client that created it. Silently adding requirements when
> > interacting with remote servers or performing a local commit that
> > introduces a well-named file undermines that guarantee.
>
> I'm not sure how important always being able to read a repo with the
> original client is, outside of Mercurial development.  We've only got a
> small group at work (~30 developers), but we don't go backward.  And when
> we rolled out LFS, I *wanted* to force client upgrades so the extension
> would be available.  Maybe a large open source community is different in
> that regard, but they would seem less likely to use these extensions.
>
> > Furthermore, I'm attempting to refactor how repository objects are
> > constructed. Essentially I want to make the repository type dynamically
> > derived from .hg/requires. i.e. instead of monkeypatching repo objects
> > post
> > construction, we'll build a type that is custom tailored for that exact
> > repository. Unfortunately, largefiles and lfs dynamically changing the
> > requirements - and therefore behavior of a repository object -
> > mid-operation (like an `hg pull`) undermines my "repo type is static for
> > its lifetime" goals.
> >
> > I wanted to take a quick temperature check to see if we're comfortable
> > incurring a BC break around this auto-add-requirements behavior.
>
> I'm torn on this.  I don't want to hold up your refactoring, but the
> single most frustrating thing about my early work with LFS was getting
> cryptic errors when the extension wasn't loaded on one side or the other
> for various reasons.  We've gotten through a couple of cycles with it
> bundled, so maybe it's a better situation than late last year.
>
> That said, the dynamic requirements update doesn't always work[1].  I
> submitted a patch earlier this year that allowed the extension to be
> implicitly loaded, which was rejected, but you offered alternative
> ideas[2].  I wasn't sure what to do with that, so I haven't gotten back
> to
> it.  But when we recently rolled out LFS at work, there were a couple of
> SNAFUs where the extension ended up not getting added to the config,
> causing the unknown flags error.  So getting it loaded on demand seems
> more important than dynamically updating requirements.  That still leaves
> old clients that don't have the extension bundled out in the cold if we
> fail to add the requirement.  (BTW, committing LFS works without adding
> any requirement, or did when I started.  So this is maybe only a
> push/pull/unbundle issue.  But if the extension is loaded on the server
> side too, that won't fail either.  So that just increases the chance for
> a
> client to be confused somewhere.)
>

I totally understand your concerns about cryptic errors due to a missing
LFS extension. I believe poor messaging in low-level revlog error reporting
can be blamed. I'm pretty sure you'll get an error like "missing processor
for flag <int>." Assuming the Mercurial client can map an integer flag back
to a constant, we should be able to print a more actionable error there.
e.g. for the REVIDX_EXTSTORED flag, we can say "try enabling the lfs
extension."

It's also worth noting that I'm pretty sure the presence of a loaded LFS
extension on a multi-repo server can result in accidental introduction of
LFS on *any* repository. See D4646 and subsequent patches for how I'm
fixing this for the narrow extension. (We should also change LFS to not
modify the global flag processors.)

I completely forgot about [2]! As it turns out, my recent refactoring
around how local repo objects are instantiated implements some of the ideas
I wrote about in that comment. It should now be trivial to have entries in
.hg/requires imply e.g. loading an extension. I actually woke up this
morning thinking we should have the "lfs" requirement imply
"extensions.lfs=" (and possibly do away with adding the config lines to the
.hg/hgrc on repo creation).

>
> I'm not a fan of needing to run an explicit command- any time an app says
> "run this", my reaction is "why didn't you just do that for me?"  (I
> realize it can be an elaborate "are you really sure? [y/n]" prompt, but
> with these extensions, it seems like the choice has already been made if
> it's actively used on either side.)  Also, what would the UX be for
> enabling it on say, a BitBucket repo?
>

Yeah - I don't like when apps do that either. But at the same time, I also
don't like when apps make potentially breaking changes to my data without
warning me about it first! (I concede I'm a power user here and most users
don't care as long as the operation is safe.)

It's possible we could auto-add the requirement then retry the current
operation [against a new repo instance].

I'm not sure what the UX would be for enabling LFS on a hosted repo. Am I
right in assuming that today, as long as the lfs extension is loaded on the
server, pushing changesets with LFS bits automagically adds the LFS
requirement on the server? That behavior is both somewhat cool and a bit
terrifying as a server operator!

>
> > I'm proposing that instead of auto-adding requirements when data is
> > encountered, we hard fail when incoming data references largefiles or
> > lfs.
> > The error message would say something like "largefiles/lfs not enabled on
> > this repository; enable largefiles/lfs via `hg debugenablelfs` and retry
> > this operation. See <URL> for more info." This error would manifest:
> >
> > * During `hg pull`, immediately after retrieving the remote's
> > capabilities
> > (before any data has been transferred).
> > * During `hg unbundle`, when the first data referencing largefiles/lfs is
> > encountered.
> > * During `hg commit` if the committed data involves LFS.
> >
> > The most user hostile is `hg unbundle`, as we won't see an error until
> > some
> > data has been transferred. So you could potentially transfer megabytes of
> > data *then* get the error. The other errors should be ~immediate.
> >
> > When cloning from a repo with largefiles/lfs enabled, we would continue
> > to
> > automagically set up the repo for largefiles/lfs from its onset. So this
> > proposed change would only impact people who have existing clones then
> > attempt to pull/unbundle/commit data that introduces largefiles/lfs. In
> > other words, introducing largefiles/lfs into an existing repository would
> > require all existing clones to take manual action to enable it.
> >
> > If we can't incur this BC break, it makes my alternate storage backend
> > work
> > a little harder. But I think I can work around it.
> >
> > An alternative to the BC break would be supporting largefiles/lfs by
> > default. i.e. the .hg/requires entry wouldn't be necessary to enable the
> > feature. But this would entail moving basic functionality to core and/or
> > raising an error when attempting to resolve largefiles/lfs data without
> > the
> > extension loaded.
>
> I'd be interested in supporting LFS by default, whenever I get around to
> finishing it.  But IDK if that would help the case of a repo being
> upgraded with a new client, and pushed to an old one that doesn't auto
> load the extension and/or minimal support.
>
> I'm not sure if we can do that with largefiles, given how invasive its
> wrapping is.  There are some commands that output differently just by
> having the extension loaded, even with no actual largefiles.  (It's been
> awhile since I've used largefiles, so I don't recall an example.)
>

Thanks for all your input. I'll continue to poke at things again today.
Hopefully I'll find some creative solutions that don't require major BC
breaks. I may also fix some rough edges with regards to the revlog/flag
processing handling since I'm already neck deep in that code.

>
> > Thoughts?
>
> [1] https://bz.mercurial-scm.org/show_bug.cgi?id=5980
> [2]
>
> https://www.mercurial-scm.org/pipermail/mercurial-devel/2018-April/115830.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180920/d3741085/attachment.html>