[PATCH] subrepo: do not push "clean" subrepos when the parent repo is pushed
Matt Mackall
mpm at selenic.com
Wed Feb 13 19:31:55 CST 2013
On Thu, 2013-02-14 at 01:06 +0100, Angel Ezquerra wrote:
> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra at gmail.com>
> # Date 1360795816 -3600
> # Node ID 26276460d54aecdeb107c82c4e3f2ca7c0c6a8b3
> # Parent 55b9b294b7544a6a144f627f71f4b770907d5a98
> subrepo: do not push "clean" subrepos when the parent repo is pushed
>
> A clean subrepo is defined as one that has not had its dirstate, bookmarks or
> phases modified.
>
> This patch works by adding a "clean" method to subrepos. In the case of
> mercurial subrepos, this method calculates a "stamp" (i.e. a set of file hashes)
> of the repository state at the time of push with a similar "stamp" that was
> stored on a file when the subrepo was cloned or pushed to a given remote target.
> If the stamps match the subrepo has no changes that must be pushed to the target
> repository and thus the push can be skipped.
>
> Note that we calculate the stamp file by calculating hashes for several key
> repository files, such as the dirstate, the bookmarks file and the phaseroots
> file. This means that our "clean" detection is not perfect, in the sense that
> if the working directory has been updated to a different revision we will
> assume that the subrepo is not clean. However, if we update to another revision
> and back to the original revision the clean() method will correctly detec the
> subrepo as being clean.
Why is the dirstate interesting? I would posit that we're only
interested in things that we'd push or pull. Dirstate being modified
should not force us to push.
> Also note that a subrepo being "clean" is not the opposite of it being "dirty".
> A subrepo is dirty if it updated to a different revision that the one that is
> pointed to by the subrepo parent or if its working directory is not clean. This
> is a different concept.
Ok, let's give it a different name. How about storeclean() so it's
explicit it's about the store?
I think this needs to be (at least) a few different patches:
1) introduce the helper functions
2) record clean state on clone/pull
3) check clean state on push
> diff --git a/mercurial/subrepo.py b/mercurial/subrepo.py
> --- a/mercurial/subrepo.py
> +++ b/mercurial/subrepo.py
> @@ -300,6 +300,16 @@
>
> class abstractsubrepo(object):
>
> + def clean(self):
> + """
> + returns true if the repository has not changed since it was last
> + cloned or pulled.
> + Note that this is very different and definitely not the opposite
> + of the repository being "dirty", which is related to having changes
> + on the working directory or the current revision.
> + """
> + return False
> +
> def dirty(self, ignoreupdate=False):
> """returns true if the dirstate of the subrepo is dirty or does not
> match current stored state. If ignoreupdate is true, only check
> @@ -426,6 +436,73 @@
> self._repo.ui.setconfig('ui', '_usedassubrepo', 'True')
> self._initrepo(r, state[0], create)
>
> + def clean(self, path):
> + """
> + returns true if the repository has not changed since it was last
> + cloned or pulled.
> + Note that this is very different and definitely not the opposite
> + of the repository being "dirty", which is related to having changes
> + on the working directory or the current revision.
> + """
Duplicate docs.
> + return self._calcrepostamp(path) == self._readrepostamp(path)
This is suboptimal, especially if any of the files are large. It'd be
better to be able to break after we find the first changed file.
> + def _getfilestamp(self, filename):
> + data = ''
> + if os.path.exists(filename):
> + fd = open(filename)
> + data = fd.read()
> + fd.close()
> + return util.sha1(data).hexdigest()
Most of this doesn't want to be member functions. We'll need it for git.
> + def _calcrepostamp(self, remotepath):
> + '''calculate a unique "stamp" for the current repository state
> +
> + This method is used to to detect when there are changes that may
> + require a push to a given remote path.'''
> + filelist = ('dirstate', 'bookmarks', 'store/phaseroots')
> + stamp = ['# %s\n' % remotepath]
> + lock = self._repo.lock()
> + try:
> + for relname in filelist:
> + absname = os.path.normpath(self._repo.join(relname))
> + stamp.append('%s = %s\n' % (absname, self._getfilestamp(absname)))
> + finally:
> + lock.release()
> + return stamp
> +
> + def _getstampfilename(self, remotepath):
> + '''get a unique filename for the remote repo stamp'''
> + fname = util.sha1(remotepath).hexdigest()
> + return self._repo.join(os.path.join('stamps', fname))
Probably don't want a 40-character name here. This should go
in .hg/cache/ on hg repos and have a less generic name than stamp.
> + def _readrepostamp(self, remotepath):
> + '''read an existing remote repository stamp'''
> + stampfile = self._getstampfilename(remotepath)
> + if not os.path.exists(stampfile):
> + return ''
> + fd = open(stampfile, 'r')
> + stamp = fd.readlines()
> + fd.close()
> + return stamp
> +
> + def _updaterepostamp(self, remotepath):
> + '''
> + Calc the current repo stamp saving it into a remote repo stamp file
> + Each remote repo requires its own stamp file, because a subrepo may
> + be clean versus a given remote repo, but not versus another.
> + '''
> + # save it to the clean file
> + # We should lock the repo
> + stampfile = self._getstampfilename(remotepath)
> + # [FIXME] should lock the repo? it is already locked by _calcrepostamp
No, the lock should be in the callers of _calcrepostamp.
> + stamp = self._calcrepostamp(remotepath)
> + stampdir = self._repo.join('stamps')
> + if not os.path.exists(stampdir):
> + util.makedir(stampdir, True)
> + fd = open(stampfile, 'w')
> + fd.writelines(stamp)
> + fd.close()
> +
> @annotatesubrepoerror
> def _initrepo(self, parentrepo, source, create):
> self._repo._subparent = parentrepo
> @@ -544,12 +621,17 @@
> update=False)
> self._repo = cloned.local()
> self._initrepo(parentrepo, source, create=True)
> + self._updaterepostamp(srcurl)
> else:
> self._repo.ui.status(_('pulling subrepo %s from %s\n')
> % (subrelpath(self), srcurl))
> + cleansub = self.clean(srcurl)
> self._repo.pull(other)
> bookmarks.updatefromremote(self._repo.ui, self._repo, other,
> srcurl)
> + if cleansub:
> + # keep the repo clean after pull
> + self._updaterepostamp(srcurl)
>
> @annotatesubrepoerror
> def get(self, state, overwrite=False):
> @@ -557,6 +639,9 @@
> source, revision, kind = state
> self._repo.ui.debug("getting subrepo %s\n" % self._path)
> hg.updaterepo(self._repo, revision, overwrite)
> + srcurl = _abssource(self._repo)
> + if self.clean(srcurl):
> + self._updaterepostamp(srcurl)
>
> @annotatesubrepoerror
> def merge(self, state):
> @@ -599,10 +684,20 @@
> return False
>
> dsturl = _abssource(self._repo, True)
> + if not force:
> + if self.clean(dsturl):
> + self._repo.ui.status(
> + _('no changes made to subrepo %s since last push to %s\n')
> + % (subrelpath(self), dsturl))
> + return None
> self._repo.ui.status(_('pushing subrepo %s to %s\n') %
> (subrelpath(self), dsturl))
> other = hg.peer(self._repo, {'ssh': ssh}, dsturl)
> - return self._repo.push(other, force, newbranch=newbranch)
> + res = self._repo.push(other, force, newbranch=newbranch)
> +
> + # the repo is now clean
> + self._updaterepostamp(dsturl)
> + return res
>
> @annotatesubrepoerror
> def outgoing(self, ui, dest, opts):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list