[PATCH 1 of 2 V2] hgweb: add a hook for processing LFS Batch API requests

Matt Harbison mharbison72 at gmail.com
Thu Mar 8 08:06:57 EST 2018


> On Mar 8, 2018, at 1:17 AM, Gregory Szorc <gregory.szorc at gmail.com> wrote:
> 
>> On Wed, Mar 7, 2018 at 8:49 PM, Matt Harbison <mharbison72 at gmail.com> wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison at yahoo.com>
>> # Date 1519274700 18000
>> #      Wed Feb 21 23:45:00 2018 -0500
>> # Node ID 89382cb20bb19e089513b2ce69ef8acfa1f523fd
>> # Parent  6dab3bdb1f00932a80ffa07f80ba240c3a4b48df
>> hgweb: add a hook for processing LFS Batch API requests
>> 
>> There really isn't a clean way to give LFS a crack at intercepting the requests
>> without hardcoding some LFS knowledge in the core.  The rationale for this URI
>> is that the spec for the Batch API[1] defines the URL as the LFS server url +
>> '/objects/batch'.  The default git URLs are:
>> 
>>     Git remote: https://git-server.com/foo/bar
>>     LFS server: https://git-server.com/foo/bar.git/info/lfs
>>     Batch API: https://git-server.com/foo/bar.git/info/lfs/objects/batch
>> 
>> '.git/' seems like it's not something a user would normally track.  If we adhere
>> to how git defines the URLs, then the hg-git extension should be able to talk to
>> a git based server without any additional work.
>> 
>> I'm not sure if checking the User-Agent is a good idea, but this needs a
>> specialized client, and it seems like everyone else is doing it (3d48ae1aaa5e,
>> e7bb5fc4570c).  We can always back this off if it becomes a nuisance.  A web
>> browser will see "400: bad method", the same as it would before this change.
>> 
>> I'm not sure if None or 'pull' is the proper permission check, but the only
>> difference is whether or not `hgweb.allowpull` is checked.  Since nothing of
>> particular interest is transferred here, and the next phase handles the read or
>> write, treating this like web interface request seems fine.
>> 
>> [1] https://github.com/git-lfs/git-lfs/blob/master/docs/api/batch.md
>> 
>> diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
>> --- a/mercurial/hgweb/hgweb_mod.py
>> +++ b/mercurial/hgweb/hgweb_mod.py
>> @@ -90,6 +90,12 @@
>>          urlel = os.path.dirname(urlel)
>>      return reversed(breadcrumb)
>> 
>> +def _processlfsbatchreq(repo, req):
>> +    """A hook for the LFS extension to wrap that handles requests to the Batch
>> +    API, and returns the appropriate JSON response.
>> +    """
>> +    raise ErrorResponse(HTTP_NOT_FOUND)
>> +
>>  class requestcontext(object):
>>      """Holds state/context for an individual request.
>> 
>> @@ -376,6 +382,21 @@
>>              except ErrorResponse as inst:
>>                  return protohandler['handleerror'](inst)
>> 
>> +        # Route LFS Batch API requests to the appropriate handler
>> +
>> +        if req.env.get(r'HTTP_USER_AGENT', r'').startswith(r'git-lfs/'):
> 
> I don't like filtering by the user-agent. It is recommended to not do this. Unless it will cause problems or tons of code complexity to properly lock out actual Git clients who may get confused by this, my vote is to remove it and just rely on path filtering.

Ok, I can ditch it.  It was more about locking out web browsers and other things not POSTing a JSON request, without changing the current behavior they see.  But I guess there are other ways to remotely figure out the minimum server version.

>> +            try:
>> +                path = req.env.get(r'PATH_INFO')
>> +                if path == '/.git/info/lfs/objects/batch':
> 
> Relying on PATH_INFO is super annoying. But our code kind of sucks :/
> 
> My only suggestion would be to define `parts` in the `else` block of the `if r'PATH_INFO' in req.env:` above and check `parts == ['.git', 'info', 'lfs', 'objects', 'batch']`. That's still a big ugly though.

OoC, what’s the point of this?  I saw it in the code above this, but it seems like string handling is easier than list-of-string handling.  This looks like an easy enough change though.

> Whatever happens, my guess is I will eventually write some patches to clean the URL parsing code up.
>  
>> +                    self.check_perm(rctx, req, None)
> 
> Shouldn't this be `pull` instead of `None`? Otherwise, LFS won't honor web.* to restrict data access.

I went back and forth, but landed on this because it doesn't seem like a pull, and still checks web.deny_read and web.allow_read.  It looks like None just doesn’t check web.allowpull.  This would be checked prior to the actual file transfer in the next step.

Is there a concept of a write-only repo?  Like a try server where you can only push stuff, but not pull all of the anonymous heads.  If not, then I think I can change it.

>> +                    return _processlfsbatchreq(rctx.repo, req)
>> +                else:
>> +                    raise ErrorResponse(HTTP_NOT_FOUND)
>> +            except ErrorResponse as inst:
>> +                req.respond(inst, 'text/plain; charset=utf-8')
>> +                # No body, since only lfs clients are allowed here
>> +                return ['']
>> +
>>          # translate user-visible url structure to internal structure
>> 
>>          args = query.split('/', 2)
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180308/a531f81a/attachment.html>


More information about the Mercurial-devel mailing list