[PATCH RFC] largefiles: abort push when ordinary file fit largefile pattern (issue3245)

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Fri Jul 15 11:46:46 EDT 2016


At Wed, 06 Jul 2016 13:47:07 +0200,
liscju wrote:
> 
> # HG changeset patch
> # User liscju <piotr.listkiewicz at gmail.com>
> # Date 1467747171 -7200
> #      Tue Jul 05 21:32:51 2016 +0200
> # Node ID 5d1ce681c75b02482cf8a96c47609efa864b4d16
> # Parent  6a98f9408a504be455d4382801610daceac429e6
> largefiles: abort push when ordinary file fit largefile pattern (issue3245)
> 
> So far setting minsize and patterns for largefiles worked only
> locally changing which files were added/committed as largefiles.
> It did not prevent pushing ordinary files that exceeded minsize
> or matched pattern for largefiles on server.
> 
> To solve this there are two possible solutions:
> 1) Check ordinary files before pushing, if they match pattern
> for largefiles on server - abort push
> 2) Check ordinary files when there are unbundled on server,
> abort when there are files that match pattern for largefiles
> 
> This commit is an implementation of #1 idea to discuss and needs
> some changes before being merged.
> 
> This solution at first downloads minsize and patterns from server,
> check if any of files exceeds minsize or match pattern and if it
> does aborts push.
> 
> This solution works only when server/client has version of hg that
> support this checking. Idea #2 would work for any client as far as
> server has support for this, but the drawback of this idea is that
> checks are done after sending files to server.

I summarized combination of "condition" before/after pushing and
push-ability. "condition is met" means that all normal files are
smaller than minsize, and don't match against patterns.

  = ======================= ===== ==============================
       meeting condition
    =======================
    before intermed  after
  # push   revs      push   push  condition is
  = ====== ========= ====== ===== ==============================
  1   o       o        o     OK   still met
  2   o       o        x     NG   broken at the end of push
  3   o       x        x     NG   broken always in pushed revs
  4   o       x        o     ?    broken temporarily

  5   x       x        o     ?    met at the end of push
  6   x       x        x     NG?  still broken
  7   x       o        x     NG?  met temporarily
  8   x       o        o     OK   met always in pushed revs
  = ====== ========= ====== ===== ==============================

Bundled eol extension provides two functions below for
pretxnchangegroup hook or so, and it might help discussion about "?":

  - "eol.checkallhook" to check all revisions to be added
  - "eol.checkheadshook" to check only heads to be added

For issue3245, pros and cons of "check all revisions":

  pros:
    - prevent all incorrect revisions from being pushed (like #4 above)
    - maybe not so expensive for (today's rich) client side processing

  cons:
    - prevent intermediate revisions from being pushed

      they might be needed to resolve already broken condition
      cleanly along the project local workflow (like #5 above)

    - maybe expensive for server side processing

      especially for repos managing many files.

"check only heads of revisions" is vice versa.

Users might want to choose one of them according to their environment.

BTW, in "condition is already broken before push" (#5 - #8 above)
cases:

  - behave differently from #1 - #4 cases might be useful for
    "transition period" (like #5 and #6), but

  - checking whether "condition is already broken before push" at
    each exchange is expensive (especially for server side
    processing)

    caching or so should be needed.

OK, this is just an idea :-)

Other memorandum and nit picking:

  - exchanging minsize/patterns via pushkey seems easier than
    adding new wire protocol

    Is there any specific reason to exchange them via wire
    protocol ? (or I might forget or overlook ones already known
    in discussion about "largefiles at a different location"
    plan)

  - at first, providing prohibition mechanism as an independent hook
    might be better than requiring to upgrade "hg" itself (especially
    for server side)

    But in fact, I'm not sure about recent trend of maintenance policy
    on server side (keeping current version as long as possible, or
    upgrading rapidly, etc etc ...).

  - size information should be treated by util.sizetoint() and
    util.bytecount(), IMHO

    in this patch, it is always specified in "MB", and is directly
    rendered by "%0.6f" format or so.


> diff --git a/hgext/largefiles/basestore.py b/hgext/largefiles/basestore.py
> --- a/hgext/largefiles/basestore.py
> +++ b/hgext/largefiles/basestore.py
> @@ -11,7 +11,12 @@ from __future__ import absolute_import
>  
>  from mercurial.i18n import _
>  
> -from mercurial import node, util
> +from mercurial import (
> +    error,
> +    match as matchmod,
> +    node,
> +    util,
> +)
>  
>  from . import lfutil
>  
> @@ -149,6 +154,32 @@ class basestore(object):
>                  % (numrevs, numlfiles))
>          return int(failed)
>  
> +    def checkfiletopush(self, revs):
> +        lfsize, lfpats = self._getlfpatterns()
> +
> +        pats = []
> +        if lfsize:
> +            # lfsize is send in MB, 6 decimal points is to have
> +            # byte prevision
> +            pats.append("set:size('>%.6fMB')" % lfsize)
> +        if lfpats:
> +            pats.extend(lfpats)
> +
> +        if pats:
> +            for rev in revs:
> +                ctx = self.repo[rev]
> +                # cwd parameter is equal to repo.root to prevent
> +                # 'abort: pattern not under root' when hg used
> +                # with -R option
> +                lfmatcher = matchmod.match(self.repo.root,
> +                                           self.repo.root, pats, ctx=ctx)
> +                filesmatching = ctx.walk(lfmatcher)
> +
> +                for file in filesmatching:
> +                    raise error.Abort(_("Files %s in revision %s should be "
> +                                        "marked as largefiles")
> +                                      % (file, rev))
> +
>      def _getfile(self, tmpfile, filename, hash):
>          '''Fetch one revision of one file from the store and write it
>          to tmpfile.  Compute the hash of the file on-the-fly as it
> @@ -164,3 +195,8 @@ class basestore(object):
>          Returns _true_ if any problems are found!
>          '''
>          raise NotImplementedError('abstract method')
> +
> +    def _getlfpatterns(self):
> +        '''Get tuple of size and pattern of files that should be
> +        treated as large files.'''
> +        raise NotImplementedError('abstract method')
> diff --git a/hgext/largefiles/localstore.py b/hgext/largefiles/localstore.py
> --- a/hgext/largefiles/localstore.py
> +++ b/hgext/largefiles/localstore.py
> @@ -64,3 +64,16 @@ class localstore(basestore.basestore):
>                          % (cset, filename, storepath))
>                      failed = True
>          return failed
> +
> +    def _getlfpatterns(self):
> +        lfsize = self.remote.ui.config(lfutil.longname, 'minsize', default=10)
> +        if lfsize:
> +            try:
> +                lfsize = float(lfsize)
> +            except ValueError:
> +                lfsize = None
> +
> +        lfpats = self.remote.ui.configlist(
> +            lfutil.longname, 'patterns', default=[])
> +
> +        return lfsize, lfpats
> diff --git a/hgext/largefiles/proto.py b/hgext/largefiles/proto.py
> --- a/hgext/largefiles/proto.py
> +++ b/hgext/largefiles/proto.py
> @@ -90,6 +90,26 @@ def statlfile(repo, proto, sha):
>          return '2\n'
>      return '0\n'
>  
> +def getlfpatterns(repo, proto):
> +    '''Server command for getting largefile patterns.'''
> +    lfsize = repo.ui.config(lfutil.longname, 'minsize', default=10)
> +    if lfsize:
> +        try:
> +            lfsize = float(lfsize)
> +        except ValueError:
> +            lfsize = 10
> +
> +    lfpats = repo.ui.configlist(
> +        lfutil.longname, 'patterns', default=[])
> +
> +    def generator():
> +        # lfsize is in MB so 6 points after decimal is enough
> +        yield '%0.6f\n' % lfsize
> +        yield '%d\n' % len(lfpats)
> +        for lfpat in lfpats:
> +            yield '%s\n' % lfpat
> +    return wireproto.streamres(generator())
> +
>  def wirereposetup(ui, repo):
>      class lfileswirerepository(repo.__class__):
>          def putlfile(self, sha, fd):
> @@ -158,6 +178,28 @@ def wirereposetup(ui, repo):
>                  # either way, consider it missing.
>                  yield 2
>  
> +        def getlfpatterns(self):
> +            stream = self._callstream("getlfpatterns")
> +            lfsize = stream.readline()
> +            try:
> +                lfsize = float(lfsize)
> +            except ValueError:
> +                self._abort(error.ResponseError(_("unexpected response:"),
> +                                                lfsize))
> +
> +            lfpats = []
> +            lfpatscount = stream.readline()
> +            try:
> +                lfpatscount = int(lfpatscount)
> +            except ValueError:
> +                self._abort(error.ResponseError(_("unexpected response:"),
> +                                                lfpatscount))
> +
> +            for i in range(lfpatscount):
> +                lfpats.append(stream.readline().rstrip())
> +
> +            return lfsize, lfpats
> +
>      repo.__class__ = lfileswirerepository
>  
>  # advertise the largefiles=serve capability
> diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py
> --- a/hgext/largefiles/reposetup.py
> +++ b/hgext/largefiles/reposetup.py
> @@ -23,6 +23,7 @@ from mercurial import (
>  from . import (
>      lfcommands,
>      lfutil,
> +    storefactory,
>  )
>  
>  def reposetup(ui, repo):
> @@ -350,6 +351,14 @@ def reposetup(ui, repo):
>              actualfiles += regulars
>              return actualfiles
>  
> +        def checkpush(self, pushop):
> +            super(lfilesrepo, self).checkpush(pushop)
> +            store = storefactory.openstore(pushop.repo, pushop.remote)
> +            revs = pushop.revs
> +            if revs is None:
> +                revs = pushop.repo.revs('all()')
> +            store.checkfiletopush(revs)
> +
>      repo.__class__ = lfilesrepo
>  
>      # stack of hooks being executed before committing.
> diff --git a/hgext/largefiles/uisetup.py b/hgext/largefiles/uisetup.py
> --- a/hgext/largefiles/uisetup.py
> +++ b/hgext/largefiles/uisetup.py
> @@ -162,6 +162,7 @@ def uisetup(ui):
>      wireproto.commands['putlfile'] = (proto.putlfile, 'sha')
>      wireproto.commands['getlfile'] = (proto.getlfile, 'sha')
>      wireproto.commands['statlfile'] = (proto.statlfile, 'sha')
> +    wireproto.commands['getlfpatterns'] = (proto.getlfpatterns, '')
>  
>      # ... and wrap some existing ones
>      wireproto.commands['capabilities'] = (proto.capabilities, '')
> @@ -173,6 +174,7 @@ def uisetup(ui):
>      hgweb_mod.perms['putlfile'] = 'push'
>      hgweb_mod.perms['getlfile'] = 'pull'
>      hgweb_mod.perms['statlfile'] = 'pull'
> +    hgweb_mod.perms['getlfpatterns'] = 'pull'
>  
>      extensions.wrapfunction(webcommands, 'decodepath', overrides.decodepath)
>  
> diff --git a/hgext/largefiles/wirestore.py b/hgext/largefiles/wirestore.py
> --- a/hgext/largefiles/wirestore.py
> +++ b/hgext/largefiles/wirestore.py
> @@ -37,3 +37,6 @@ class wirestore(remotestore.remotestore)
>              batch.statlfile(hash)
>          batch.submit()
>          return dict(zip(hashes, batch.results()))
> +
> +    def _getlfpatterns(self):
> +        return self.remote.getlfpatterns()
> diff --git a/tests/test-largefiles-wireproto.t b/tests/test-largefiles-wireproto.t
> --- a/tests/test-largefiles-wireproto.t
> +++ b/tests/test-largefiles-wireproto.t
> @@ -161,7 +161,6 @@ largefiles clients refuse to push largef
>    $ cat ../hg.pid >> $DAEMON_PIDS
>    $ hg push http://localhost:$HGPORT
>    pushing to http://localhost:$HGPORT/
> -  searching for changes
>    abort: http://localhost:$HGPORT/ does not appear to be a largefile store
>    [255]
>    $ cd ..
> @@ -444,4 +443,70 @@ a large file from the server rather than
>    $ rm hg.pid access.log
>    $ killdaemons.py
>  
> +Before pushing revisions to the remote store it should check if
> +they dont contain files that are enforced to be large file on remote
> +store.
> +
> +  $ hg init enforcingserver
> +  $ cat >> enforcingserver/.hg/hgrc <<EOF
> +  > [extensions]
> +  > largefiles=
> +  > [largefiles]
> +  > # Above 7 bytes it should be large file
> +  > minsize=0.000007 
> +  > patterns=glob:*.jpg
> +  > usercache=${USERCACHE}
> +  > [web]
> +  > push_ssl = false
> +  > allow_push = *
> +  > EOF
> +
> +  $ hg serve -R enforcingserver -d -p $HGPORT --pid-file hg.pid
> +
> +  $ hg clone http://localhost:$HGPORT encorcingclient
> +  no changes found
> +  updating to branch default
> +  0 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  $ cd encorcingclient
> +
> +If user want to push revision with file below 10 bytes everything is fine
> +
> +  $ echo "AAAA" >> a
> +  $ hg add a
> +  $ hg commit -m "a"
> +  Invoking status precommit hook
> +  A a
> +  $ rm "${USERCACHE}"/*
> +  $ hg push
> +  pushing to http://localhost:$HGPORT/
> +  searching for changes
> +  remote: adding changesets
> +  remote: adding manifests
> +  remote: adding file changes
> +  remote: added 1 changesets with 1 changes to 1 files
> +
> +If user wants to push revision with file above 10 bytes it aborts
> +
> +  $ echo "BBBBBBBBBBBBB" >> b
> +  $ hg add b
> +  $ hg commit -m "b"
> +  Invoking status precommit hook
> +  A b
> +  $ hg push
> +  pushing to http://localhost:$HGPORT/
> +  abort: Files b in revision 1 should be marked as largefiles
> +  [255]
> +
> +If user wants to push revision with file that match pattern for
> +largefiles in server it aborts as well
> +
> +  $ echo "BB" > b
> +  $ hg mv b b.jpg
> +  $ hg commit --amend -m "b"
> +  saved backup bundle to $TESTTMP/encorcingclient/.hg/strip-backup/3705bb1eeaab-aa2c33ea-amend-backup.hg (glob)
> +  $ hg push
> +  pushing to http://localhost:$HGPORT/
> +  abort: Files b.jpg in revision 1 should be marked as largefiles
> +  [255]
> +
>  #endif
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list