[PATCH 2 of 2 RFC] extdiff: use -S to archive the full repo

Matt Harbison mharbison72 at gmail.com
Sun Feb 22 16:46:04 CST 2015


On Wed, 18 Feb 2015 14:42:17 -0500, Mathias De Maré  
<mathias.demare at gmail.com> wrote:

> On Fri, Feb 13, 2015 at 4:45 AM, Matt Harbison <mharbison72 at gmail.com>
> wrote:
>
>>
>> Sorry for the delayed response.  I had to reboot last night, and Windows
>> decided to install 11 updates... and it still wasn't done 12 hours later
>> when I left for work this morning.
>>
> As you might have noticed by now, I'm a bit strapped for time myself (so
> apologies for an even longer delay).
>
>>
>> I put together some tests to see if I could figure out the differences
>> between what I was seeing and what you were.  I had assumed that adding  
>> a
>> file to a subrepo and modifying one in a subrepo wouldn't cause a
>> difference in how the recursion is handled- but that doesn't appear to  
>> be
>> the case.  This can be applied on top of the 3rd patch in this chain.
>> There's either a subtle subrepo bug, or (more likely), I'm missing
>> something.
>>
> I've modified your below tests to clarify the bug a bit. Have a look at  
> the
> difference between my patch below and your patch to see. The issue does  
> not
> appear for only one file, because extdiff behaves differently for one  
> file
> in the working directory versus multiple!
> I only modified your patch for hg subrepos, but same goes for git and
> subversion repos.

Yep, good point.

> # HG changeset patch
> # User Matt Harbison <matt_harbison at yahoo.com>
> # Date 1423795917 18000
> #      Don Feb 12 21:51:57 2015 -0500
> # Node ID ed6dfa605f0451605fc42658c5492d29e570eac7
> # Parent  e7b1097a59713d2eb813c09013d71f7e1bb676ff
> extdiff: basic hg subrepo testing
>
> What extdiff sees when a subrepo has a modified file, and when it has an
> added
> file is different for reasons unknown.  A modified file in a git subrepo
> also
> works as we would like (i.e. the modified file is archived), but  
> apparently
> it
> doesn't archive an added file either.
>
> Changes by Mathias: modified files are NOT seen by extdiff either! They  
> are
> when
> it's just one, but that's because archive is only called for MORE than  
> one
> file!
>
> diff --git a/mercurial/subrepo.py b/mercurial/subrepo.py
> --- a/mercurial/subrepo.py
> +++ b/mercurial/subrepo.py
> @@ -658,6 +658,8 @@ class hgsubrepo(abstractsubrepo):
>      def diff(self, ui, diffopts, node2, match, prefix, **opts):
>          try:
>              node1 = node.bin(self._state[1])
> +#            print('hgsub diff: "%s"' % self._state[1])
> +
>              # We currently expect node2 to come from substate and be
>              # in hex format
>              if node2 is not None:
> @@ -676,6 +678,7 @@ class hgsubrepo(abstractsubrepo):
>          total = abstractsubrepo.archive(self, archiver, prefix, match)
>          rev = self._state[1]
>          ctx = self._repo[rev]
> +#        print('hgsub archive: "%s, ctx type is %s"' % (rev, type(ctx)))
>          for subpath in ctx.substate:
>              s = subrepo(ctx, subpath)
>              submatch = matchmod.narrowmatcher(subpath, match)
> diff --git a/tests/test-subrepo-extdiff.t b/tests/test-subrepo-extdiff.t
> new file mode 100644
> --- /dev/null
> +++ b/tests/test-subrepo-extdiff.t
> @@ -0,0 +1,111 @@
> +  $ cat >> $HGRCPATH <<EOF
> +  > [extensions]
> +  > extdiff=
> +  > EOF
> +
> +  $ hg init root
> +  $ echo "hgsub = hgsub" > root/.hgsub
> +  $ cd root
> +  $ hg init hgsub
> +  $ hg add .hgsub
> +  $ echo test > hgsub/modified.txt
> +  $ echo test2 > hgsub/modified2.txt
> +  $ hg -R hgsub add hgsub/modified.txt
> +  $ hg -R hgsub add hgsub/modified2.txt
> +
> +-------------------------------------------------------------------------------
> +For hg subrepos that have not been locked in, changes are visible to  
> diff,
> but
> +NOT extdiff.  self._state[1] is ''
> +
> +  $ hg status -S
> +  A .hgsub
> +  A hgsub/modified.txt
> +  A hgsub/modified2.txt
> +
> +  $ hg diff -S
> +  diff -r 000000000000 .hgsub
> +  --- /dev/null    Thu Jan 01 00:00:00 1970 +0000
> +  +++ b/.hgsub    * (glob)
> +  @@ -0,0 +1,1 @@
> +  +hgsub = hgsub
> +  diff -r 000000000000 hgsub/modified.txt
> +  --- /dev/null    Thu Jan 01 00:00:00 1970 +0000
> +  +++ b/hgsub/modified.txt    * (glob)
> +  @@ -0,0 +1,1 @@
> +  +test
> +  diff -r 000000000000 hgsub/modified2.txt
> +  --- /dev/null    Thu Jan 01 00:00:00 1970 +0000
> +  +++ b/hgsub/modified2.txt    * (glob)
> +  @@ -0,0 +1,1 @@
> +  +test2
> +
> +  $ hg extdiff -S
> +  diff -Npru root.000000000000/.hgsub root/.hgsub
> +  --- root.000000000000/.hgsub    1970-01-01 00:00:00* +0000 (glob)
> +  +++ root/.hgsub    * (glob)
> +  @@ -0,0 +1 @@
> +  +hgsub = hgsub
> +  [1]
> +
> +  $ hg ci -qm "modified.txt -> test" -S
> +
> +-------------------------------------------------------------------------------
> +For hg subrepos that *have* been locked in, changes are visible to diff
> +AND extdiff.  self._state[1] is 50a11ebc6587
> +
> +  $ echo 'second mod' > hgsub/modified.txt
> +  $ echo 'second mod' > hgsub/modified2.txt
> +
> +  $ hg status -S
> +  M hgsub/modified.txt
> +  M hgsub/modified2.txt
> +  $ hg diff -S
> +  diff -r 4034090a5971 hgsub/modified.txt
> +  --- a/hgsub/modified.txt    Thu Jan 01 00:00:00 1970 +0000
> +  +++ b/hgsub/modified.txt    * (glob)
> +  @@ -1,1 +1,1 @@
> +  -test
> +  +second mod
> +  diff -r 4034090a5971 hgsub/modified2.txt
> +  --- a/hgsub/modified2.txt    Thu Jan 01 00:00:00 1970 +0000
> +  +++ b/hgsub/modified2.txt    * (glob)
> +  @@ -1,1 +1,1 @@
> +  -test2
> +  +second mod
> +  $ hg extdiff -S
> +  [1]
> +
> +-------------------------------------------------------------------------------
> +But adding a file to hgsubrepo causes extdiff to see *nothing*.
> self_state[1]
> +is still 50a11ebc6587, as it was when we had better results in the last
> test.
> +
> +  $ echo 'added' > hgsub/added.txt
> +#  $ hg -R hgsub add hgsub/added.txt
> +  $ hg add hgsub/added.txt
> +
> +  $ hg status -S
> +  M hgsub/modified.txt
> +  M hgsub/modified2.txt
> +  A hgsub/added.txt
> +
> +  $ hg diff -S
> +  diff -r 4034090a5971 hgsub/added.txt
> +  --- /dev/null    Thu Jan 01 00:00:00 1970 +0000
> +  +++ b/hgsub/added.txt    * (glob)
> +  @@ -0,0 +1,1 @@
> +  +added
> +  diff -r 4034090a5971 hgsub/modified.txt
> +  --- a/hgsub/modified.txt    Thu Jan 01 00:00:00 1970 +0000
> +  +++ b/hgsub/modified.txt    * (glob)
> +  @@ -1,1 +1,1 @@
> +  -test
> +  +second mod
> +  diff -r 4034090a5971 hgsub/modified2.txt
> +  --- a/hgsub/modified2.txt    Thu Jan 01 00:00:00 1970 +0000
> +  +++ b/hgsub/modified2.txt    * (glob)
> +  @@ -1,1 +1,1 @@
> +  -test2
> +  +second mod
> +
> +  $ hg extdiff -S
> +  [1]
> diff --git a/tests/test-subrepo-git.t b/tests/test-subrepo-git.t
> --- a/tests/test-subrepo-git.t
> +++ b/tests/test-subrepo-git.t
> @@ -698,6 +698,16 @@ check differences made by most recent ch
>    +foo
>    +bar
>
> +  $ hg extdiff -S --config extensions.extdiff=
> +  --- nul      1970-01-01 00:00:00 +0000
> +  +++ $TESTTMP/tc/s/foobar     * (glob)
> +  @@ -0,0 +1,4 @@
> +  +woopwoop
> +  +
> +  +foo
> +  +bar
> +  [1]
> +
>    $ hg commit --subrepos -m "Added foobar"
>    committing subrepository s
>    created new head
>
>
>> After further digging, the only time I've seen this case triggered is if
>> the subrepo hasn't been committed to the parent repo yet.  I was hoping
>> this would be used as 'give me the wctx of this subrepo', but it's  
>> not.  I
>> can see how it gets there in subrepo:145.
>>
>> I wonder if context.substate() should be overridden in workingctx to set
>> the state to None, so that repo[None].substate always yields subrepo
>> working directories.  We would need it for a revset symbol that  
>> indicates
>> the working directory too.  It's different from repo['.'].substate, so I
>> don't think it will break anything.  I'll try it and see what happens,  
>> but
>> maybe a subrepo expert can chime in.
>>
>
> I agree that we probably need to move in that direction. Not sure who the
> subrepo experts are, but remarks welcome.
> Below is an example hack that can be applied to the above patch, it shows
> what happens when the substate is also set to None (extdiff works, yay!).
> This of course has nice side-effects like 'hg status -S' not working
> correctly anymore, so I think we're in for a bit more work.

Yeah, after I sent the email, I saw this.  And it crashes diff -S in my  
test case too.  I wasn't able to figure out why, but I only spent a few  
hours on it.  It looks like you hit it too from the patch below.  I'm  
starting to see why I never finished this in the first place...

> # HG changeset patch
> # User Mathias De Maré <mathias.demare at gmail.com>
> # Date 1424287953 -3600
> #      Mit Feb 18 20:32:33 2015 +0100
> # Node ID a1ec50dc0168e2e0798d74efaaf1e10df66f5a63
> # Parent  ed6dfa605f0451605fc42658c5492d29e570eac7
> subrepo: hack in propagation of top-level 'None' context
>
> Previously, subrepo context was determined by looking at
> substate. However, in the case where the context of
> the parent repository is the working directory (repo[None]),
> this leads to incorrect archiving.
>
> This patch is a hack, but it's purely meant to show what type of change
> would be required to fix subrepo archiving.
>
> (Note as a side-effect of this hack that 'hg st -S' is now messed up.
>
> diff --git a/mercurial/archival.py b/mercurial/archival.py
> --- a/mercurial/archival.py
> +++ b/mercurial/archival.py
> @@ -305,6 +305,7 @@ def archive(repo, dest, node, kind, deco
>
>      if subrepos:
>          for subpath in sorted(ctx.substate):
> +#            repo.ui.write("going to archive, toplevel rev: %s\n" %
> ctx.rev())
>              sub = ctx.sub(subpath)
>              submatch = matchmod.narrowmatcher(subpath, matchfn)
>              total += sub.archive(archiver, prefix, submatch)
> diff --git a/mercurial/subrepo.py b/mercurial/subrepo.py
> --- a/mercurial/subrepo.py
> +++ b/mercurial/subrepo.py
> @@ -334,8 +334,13 @@ def subrepo(ctx, path):
>
>      pathutil.pathauditor(ctx._repo.root)(path)
>      state = ctx.substate[path]
> +#    ctx._repo.ui.write("substate: %s\n" % str(state))
>      if state[2] not in types:
>          raise util.Abort(_('unknown subrepo type %s') % state[2])
> +    if ctx.rev() is None:
> +        state = list(state)
> +        state[1] = None
> +        state = tuple(state)
>      return types[state[2]](ctx, path, state[:2])
>
>  def newcommitphase(ui, ctx):
> @@ -657,7 +662,10 @@ class hgsubrepo(abstractsubrepo):
>      @annotatesubrepoerror
>      def diff(self, ui, diffopts, node2, match, prefix, **opts):
>          try:
> -            node1 = node.bin(self._state[1])
> +            if self._state[1]:
> +                node1 = node.bin(self._state[1])
> +            else:
> +                node1 = None
>  #            print('hgsub diff: "%s"' % self._state[1])
>
>              # We currently expect node2 to come from substate and be
> @@ -728,6 +736,8 @@ class hgsubrepo(abstractsubrepo):
>
>      def _get(self, state):
>          source, revision, kind = state
> +        if revision is None:
> +            return True
>          if revision in self._repo.unfiltered():
>              return True
>          self._repo._subsource = source
> diff --git a/tests/test-subrepo-extdiff.t b/tests/test-subrepo-extdiff.t
> --- a/tests/test-subrepo-extdiff.t
> +++ b/tests/test-subrepo-extdiff.t
> @@ -19,8 +19,6 @@ NOT extdiff.  self._state[1] is ''
>
>    $ hg status -S
>    A .hgsub
> -  A hgsub/modified.txt
> -  A hgsub/modified2.txt
>
>    $ hg diff -S
>    diff -r 000000000000 .hgsub
> @@ -40,9 +38,8 @@ NOT extdiff.  self._state[1] is ''
>    +test2
>
>    $ hg extdiff -S
> -  diff -Npru root.000000000000/.hgsub root/.hgsub
> -  --- root.000000000000/.hgsub    1970-01-01 00:00:00* +0000 (glob)
> -  +++ root/.hgsub    * (glob)
> +  --- /dev/null    * (glob)
> +  +++ $TESTTMP/root/.hgsub    * (glob)
>    @@ -0,0 +1 @@
>    +hgsub = hgsub
>    [1]
> @@ -73,6 +70,18 @@ AND extdiff.  self._state[1] is 50a11ebc
>    -test2
>    +second mod
>    $ hg extdiff -S
> +  diff -Npru root.a642cdd33cab/hgsub/modified.txt  
> root/hgsub/modified.txt
> +  --- root.a642cdd33cab/hgsub/modified.txt    * (glob)
> +  +++ root/hgsub/modified.txt    * (glob)
> +  @@ -1 +1 @@
> +  -test
> +  +second mod
> +  diff -Npru root.a642cdd33cab/hgsub/modified2.txt  
> root/hgsub/modified2.txt
> +  --- root.a642cdd33cab/hgsub/modified2.txt    * (glob)
> +  +++ root/hgsub/modified2.txt    * (glob)
> +  @@ -1 +1 @@
> +  -test2
> +  +second mod
>    [1]
>
>  -------------------------------------------------------------------------------
> @@ -108,4 +117,21 @@ is still 50a11ebc6587, as it was when we
>    +second mod
>
>    $ hg extdiff -S
> +  diff -Npru root.a642cdd33cab/hgsub/added.txt root/hgsub/added.txt
> +  --- root.a642cdd33cab/hgsub/added.txt    1970-01-01 00:00:00.000000000
> +0000
> +  +++ root/hgsub/added.txt    * (glob)
> +  @@ -0,0 +1 @@
> +  +added
> +  diff -Npru root.a642cdd33cab/hgsub/modified.txt  
> root/hgsub/modified.txt
> +  --- root.a642cdd33cab/hgsub/modified.txt    * (glob)
> +  +++ root/hgsub/modified.txt    * (glob)
> +  @@ -1 +1 @@
> +  -test
> +  +second mod
> +  diff -Npru root.a642cdd33cab/hgsub/modified2.txt  
> root/hgsub/modified2.txt
> +  --- root.a642cdd33cab/hgsub/modified2.txt    * (glob)
> +  +++ root/hgsub/modified2.txt    * (glob)
> +  @@ -1 +1 @@
> +  -test2
> +  +second mod
>    [1]
>
> Side note: I was thinking about setting up a Kallithea instance to work  
> on
> this together without polluting the mailing list too much. What's your
> opinion?

I think mpm wants to keep all discussions on the list, for current  
learning and future archeology purposes.  It's also more likely to get a  
helpful suggestion from a casual reader.

It looks like either queuebot had a hiccup today, or he dumped his entire  
inbox to catchup.  So I'm sort of waiting until crew gets pulled into the  
main repo, so I can rebase/evolve the current mess of branches I have and  
reevaluate.  Once that happens, I'll resend what I have to reboot the  
discussion- it's too hard to follow it all over a several week period of  
deadends and wrong turns.

BTW, a (slightly?) cleaner variant on what you have above is maybe (I  
didn't try this yet) inside subrepo.archive(), instead of going right to  
accessing substate[1], call self.ctx.subrev(path).  committablectx has an  
override that returns None, maybe workingctx should too.  It isn't nice  
that this has to be looked up again (and in each subrepo class), but it is  
a cached property, so no real harm for the non workingctx case?  That  
would sidestep the issue with breaking status and diff (and probably other  
stuff), until that can be sorted out.

The only hurtle that I see is hgsubrepo doesn't have a parent ctx yet.   
I've been meaning to add one, but I can't shake the feeling that it would  
need to be updated outside of __init__.  (Consider what happens if to a  
nested subrepo if you revert its parent subrepo- it seems that its ctx  
would be wrong.)  But by simply adding it, it wouldn't be any more wrong  
than git and svn currently are.

--Matt

> Greetings,
> Mathias


More information about the Mercurial-devel mailing list