[PATCH 1 of 2 V2] hgweb: add a hook for processing LFS Batch API requests

Gregory Szorc gregory.szorc at gmail.com
Thu Mar 8 13:38:09 EST 2018


On Thu, Mar 8, 2018 at 5:06 AM, Matt Harbison <mharbison72 at gmail.com> wrote:

>
> On Mar 8, 2018, at 1:17 AM, Gregory Szorc <gregory.szorc at gmail.com> wrote:
>
> On Wed, Mar 7, 2018 at 8:49 PM, Matt Harbison <mharbison72 at gmail.com>
> wrote:
>
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison at yahoo.com>
>> # Date 1519274700 18000
>> #      Wed Feb 21 23:45:00 2018 -0500
>> # Node ID 89382cb20bb19e089513b2ce69ef8acfa1f523fd
>> # Parent  6dab3bdb1f00932a80ffa07f80ba240c3a4b48df
>> hgweb: add a hook for processing LFS Batch API requests
>>
>> There really isn't a clean way to give LFS a crack at intercepting the
>> requests
>> without hardcoding some LFS knowledge in the core.  The rationale for
>> this URI
>> is that the spec for the Batch API[1] defines the URL as the LFS server
>> url +
>> '/objects/batch'.  The default git URLs are:
>>
>>     Git remote: https://git-server.com/foo/bar
>>     LFS server: https://git-server.com/foo/bar.git/info/lfs
>>     Batch API: https://git-server.com/foo/bar.git/info/lfs/objects/batch
>>
>> '.git/' seems like it's not something a user would normally track.  If we
>> adhere
>> to how git defines the URLs, then the hg-git extension should be able to
>> talk to
>> a git based server without any additional work.
>>
>> I'm not sure if checking the User-Agent is a good idea, but this needs a
>> specialized client, and it seems like everyone else is doing it
>> (3d48ae1aaa5e,
>> e7bb5fc4570c).  We can always back this off if it becomes a nuisance.  A
>> web
>> browser will see "400: bad method", the same as it would before this
>> change.
>>
>> I'm not sure if None or 'pull' is the proper permission check, but the
>> only
>> difference is whether or not `hgweb.allowpull` is checked.  Since nothing
>> of
>> particular interest is transferred here, and the next phase handles the
>> read or
>> write, treating this like web interface request seems fine.
>>
>> [1] https://github.com/git-lfs/git-lfs/blob/master/docs/api/batch.md
>>
>> diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
>> --- a/mercurial/hgweb/hgweb_mod.py
>> +++ b/mercurial/hgweb/hgweb_mod.py
>> @@ -90,6 +90,12 @@
>>          urlel = os.path.dirname(urlel)
>>      return reversed(breadcrumb)
>>
>> +def _processlfsbatchreq(repo, req):
>> +    """A hook for the LFS extension to wrap that handles requests to the
>> Batch
>> +    API, and returns the appropriate JSON response.
>> +    """
>> +    raise ErrorResponse(HTTP_NOT_FOUND)
>> +
>>  class requestcontext(object):
>>      """Holds state/context for an individual request.
>>
>> @@ -376,6 +382,21 @@
>>              except ErrorResponse as inst:
>>                  return protohandler['handleerror'](inst)
>>
>> +        # Route LFS Batch API requests to the appropriate handler
>> +
>> +        if req.env.get(r'HTTP_USER_AGENT', r'').startswith(r'git-lfs/'):
>>
>
> I don't like filtering by the user-agent. It is recommended to not do
> this. Unless it will cause problems or tons of code complexity to properly
> lock out actual Git clients who may get confused by this, my vote is to
> remove it and just rely on path filtering.
>
>
> Ok, I can ditch it.  It was more about locking out web browsers and other
> things not POSTing a JSON request, without changing the current behavior
> they see.  But I guess there are other ways to remotely figure out the
> minimum server version.
>
> +            try:
>> +                path = req.env.get(r'PATH_INFO')
>> +                if path == '/.git/info/lfs/objects/batch':
>>
>
> Relying on PATH_INFO is super annoying. But our code kind of sucks :/
>
> My only suggestion would be to define `parts` in the `else` block of the
> `if r'PATH_INFO' in req.env:` above and check `parts == ['.git', 'info',
> 'lfs', 'objects', 'batch']`. That's still a big ugly though.
>
>
> OoC, what’s the point of this?  I saw it in the code above this, but it
> seems like string handling is easier than list-of-string handling.  This
> looks like an easy enough change though.
>

It turns out my HTTP protocol work has spurred me to author a proper fix
for this madness. Expect some patches from me today to completely overhaul
this URL code.


>
> Whatever happens, my guess is I will eventually write some patches to
> clean the URL parsing code up.
>
>
>> +                    self.check_perm(rctx, req, None)
>>
>
> Shouldn't this be `pull` instead of `None`? Otherwise, LFS won't honor
> web.* to restrict data access.
>
>
> I went back and forth, but landed on this because it doesn't seem like a
> pull, and still checks web.deny_read and web.allow_read.  It looks like
> None just doesn’t check web.allowpull.  This would be checked prior to the
> actual file transfer in the next step.
>
> Is there a concept of a write-only repo?  Like a try server where you can
> only push stuff, but not pull all of the anonymous heads.  If not, then I
> think I can change it.
>

No there isn't. I'm not sure how that would be implemented, since commands
used for discovery (known, heads, etc) are called on both read and write
operations. The "push" and "pull" names aren't that good :/


>
> +                    return _processlfsbatchreq(rctx.repo, req)
>> +                else:
>> +                    raise ErrorResponse(HTTP_NOT_FOUND)
>> +            except ErrorResponse as inst:
>> +                req.respond(inst, 'text/plain; charset=utf-8')
>> +                # No body, since only lfs clients are allowed here
>> +                return ['']
>> +
>>          # translate user-visible url structure to internal structure
>>
>>          args = query.split('/', 2)
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180308/4e53f98c/attachment.html>


More information about the Mercurial-devel mailing list