Bug 3843 - Python stuck at 100% CPU rebasing mq patches over rename
Summary: Python stuck at 100% CPU rebasing mq patches over rename
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: rebase (show other bugs)
Version: earlier
Hardware: Macintosh Mac OS
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-25 21:05 UTC by Matt N.
Modified: 2017-11-01 18:05 UTC (History)
5 users (show)

See Also:
Python Version: ---


Attachments
gzipped output from: hg rebase --debug -v -s qbase -d tip (136.30 KB, application/x-gzip)
2013-02-26 18:54 UTC, Matt N.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matt N. 2013-02-25 21:05 UTC
Steps to reproduce:
1) hg clone -r 1f8d24cee3c7 https://hg.mozilla.org/mozilla-central && cd mozilla-central
2) hg qimport https://bugzilla.mozilla.org/attachment.cgi?id=717451
3) hg qpush

> $ hg glog -l 2 --style compact
> @  122743[attachment.cgi?id=717451,qbase,qtip,tip]:122731   4bc7ec0db5e2   > 2013-02-25 17:06 -0800   mozilla
> |    Bug 738491 - Implement the Australis tab shape for Windows. r=dao
> |
> | o  122742   1f8d24cee3c7   2013-02-22 22:23 +0100   ttaubert
> | |    Backed out changeset 4d74fe8e6560 (bug 842511)
> | |

4) hg pull --rebase -r 7318b2b26843
> added 1 changesets with 1517 changes to 1517 files (+1 heads)
> merging browser/base/content/browser.xul
> merging browser/themes/windows/browser-aero.css and browser/themes/winstripe/browser-aero.css to browser/themes/windows/browser-aero.css
> merging browser/themes/windows/browser.css and browser/themes/winstripe/browser.css to browser/themes/windows/browser.css
   Merge tool (diffmerge) opens and suppose I choose everything from the left
> merging browser/themes/windows/jar.mn and browser/themes/winstripe/jar.mn to browser/themes/windows/jar.mn
> tool diffmerge can't handle binary
> tool diffmerge can't handle binary
>  no tool found to merge toolkit/themes/windows/global/icon/close.png
> keep (l)ocal or take (o)ther?

5) Choose local or other (doesn't make a difference)
> merging toolkit/themes/windows/global/inContentUI.css and toolkit/themes/winstripe/global/inContentUI.css to toolkit/themes/windows/global/inContentUI.css

Python uses 100% CPU on a core and doesn't seem to complete.  Perhaps I didn't wait long enough for it to complete since there were 1517 changes but they were mostly renames so I don't think it should take more than 30 minutes.

I'm going to try leave it going this time. I started at 5:20 PM PST and it's still going 45 minutes later.

I'm using hg 2.5.1 but the problem also seemed to happen on 2.4.
Comment 1 Matt N. 2013-02-25 23:48 UTC
It's been a few hours now and Python is still hovering around 100% CPU usage so I'm going to interrupt it.
Comment 2 Matt N. 2013-02-26 00:04 UTC
I finally got the following bit of output before I stopped the rebase:
> warning: can't find ancestor for 'browser/themes/windows/tabbrowser/tabActiveMiddle.png' copied from 'browser/themes/winstripe/tabbrowser/tabActiveMiddle.png'!

That is a file that is added in the patch in step 2.

I'll let the rebae go for a few more hours since the above message leads me to believe it's making progress.
Comment 3 Matt Mackall 2013-02-26 12:07 UTC
(In reply to comment #0)
(In reply to comment #0)
> Steps to reproduce:
> 1) hg clone -r 1f8d24cee3c7 https://hg.mozilla.org/mozilla-central && cd
> mozilla-central
> 2) hg qimport https://bugzilla.mozilla.org/attachment.cgi?id=717451
> 3) hg qpush
> 
> > $ hg glog -l 2 --style compact
> > @  122743[attachment.cgi?id=717451,qbase,qtip,tip]:122731   4bc7ec0db5e2   > 2013-02-25 17:06 -0800   mozilla
> > |    Bug 738491 - Implement the Australis tab shape for Windows. r=dao
> > |
> > | o  122742   1f8d24cee3c7   2013-02-22 22:23 +0100   ttaubert
> > | |    Backed out changeset 4d74fe8e6560 (bug 842511)
> > | |

Ok, I'm confused.

When you do step (1), you should get changeset 1f8d24 checked out. At that point, qimport/qpush should create a linear descendant of that change.

I think you actually had this checked out before the qpush (my numbering is different):

| o  122812   e48e1afb1899   2013-02-24 14:15 -0800   nnethercote
| |    Bug 843999 - Shrink the property cache.  r=bhackett.
 
..and this changeset appears to be on an anonymous branch containing a few hundred changesets (fx team mean anything to you?).

> 4) hg pull --rebase -r 7318b2b26843
> > added 1 changesets with 1517 changes to 1517 files (+1 heads)

We pull in one changeset with a massive amount of renaming...

..and then start rebasing hundreds of apparently already-public changesets on it.

That seems like something that a) you definitely should not be doing and b) something that phases should have prevented you from doing:

$ hg rebase --debug -b . -d 7318b
abort: can't rebase immutable changeset e9ef1f1aa47d
(see hg help phases for details)

But if it is allowed somehow, it's not surprising it takes a long time. The huge number of renames means we're doing a lot more work for each changeset than in a typical rebase. At 30 seconds per merge, we could be at this for hours easily.

Setting to NEED_EXAMPLE: what is going on here?
Comment 4 Siddharth Agarwal 2013-02-26 13:43 UTC
fx-team is likely http://hg.mozilla.org/integration/fx-team/
Comment 5 Siddharth Agarwal 2013-02-26 14:01 UTC
Yes, the STR as described abort correctly with a "can't rebase immutable changeset" error. fx-team is also publishing, so I really don't know what's going on.
Comment 6 Matt Mackall 2013-02-26 18:18 UTC
So, the open questions here are:

a) why was this rebase allowed? (manual use of 'hg phase --force --draft'?)
-> Matt N, Pierre-Yves

Matt, can you do 'hg log -qr "draft()"' on this repo?

b) what's the significance of the "can't find ancestor" message? (sounds like a bug)
-> sid0 might want to take a look

We also might want to take a peek at:
c) is there something interesting happening here performance-wise 
-> sid0? some other FB person?

I would not be at all surprised if this took a long time, but perhaps not several hours. We might want a progress bar or something for large rebases. Here that would make it clear that an unexpectedly large operation was happening.
Comment 7 Matt N. 2013-02-26 18:54 UTC
Created attachment 1716 [details]
gzipped output from: hg rebase --debug -v -s qbase -d tip

The rebase completed sometime in the night with the following output for the other new files in the mq patch:
warning: can't find ancestor for 'browser/themes/windows/tabbrowser/tabBackgroundEnd.png' copied from 'browser/themes/winstripe/tabbrowser/tabBackgroundEnd.png'!
warning: can't find ancestor for 'browser/themes/windows/tabbrowser/tabBackgroundMiddle.png' copied from 'browser/themes/winstripe/tabbrowser/tabBackgroundMiddle.png'!
warning: can't find ancestor for 'browser/themes/windows/tabbrowser/tabBackgroundStart.png' copied from 'browser/themes/winstripe/tabbrowser/tabBackgroundStart.png'!
warning: can't find ancestor for 'browser/themes/windows/tabbrowser/tabEndStroke.png' copied from 'browser/themes/winstripe/tabbrowser/tabEndStroke.png'!  
warning: can't find ancestor for 'browser/themes/windows/tabbrowser/tabStartStroke.png' copied from 'browser/themes/winstripe/tabbrowser/tabStartStroke.png'!

(In reply to comment #3)
Sorry, I messed up the revision numbers in my example so let's start over.
The important point is that rebasing the mq patch on top of mozilla-central after 7318b2b26843 landed takes many hours.  It seems like this is because hg is searching for ancestors of new files in the mq patch.  Correct me if I'm wrong but, this part seems unnecessary since they are *new* files.

Steps to reproduce:
1) hg clone -r cd4402b41864 https://hg.mozilla.org/mozilla-central && cd mozilla-central
2) hg qimport https://bugzilla.mozilla.org/attachment.cgi?id=717451 -n a717451
3) hg qpush

> $ hg glog -l 2 --style compact
> @  122640[a717451,qbase,qtip,tip]   69d1c8369a7b   2013-02-26 15:37 -0800   mozilla
> |    Bug 738491 - Implement the Australis tab shape for Windows. r=dao
> |
> o  122639[qparent]   cd4402b41864   2013-02-23 12:56 +0100   dao
> |    Bug 842442 - Return NO_DEVICES_FOUND error when only video or only audio is requested with no suitable device available. r=dolske
> |

4) hg pull
> added 351 changesets with 3261 changes to 2720 files (+1 heads)
> (run 'hg heads' to see heads, 'hg merge' to merge)
5) hg rebase --debug -v -s qbase -d tip
(see attachment for output)
 
6) Choose local or other when asked about close.png (doesn't make a difference)
> merging toolkit/themes/windows/global/inContentUI.css and toolkit/themes/winstripe/global/inContentUI.css to toolkit/themes/windows/global/inContentUI.css

7) Wait for hours for this to complete.
Comment 8 Matt N. 2013-02-26 18:56 UTC
(In reply to comment #6)
> a) why was this rebase allowed? (manual use of 'hg phase --force --draft'?)
> -> Matt N, Pierre-Yves

Ignore this, it was my bad STR only.  I was trying to simplify too much and made some bad assumptions.
Comment 9 Matt Mackall 2013-02-26 20:42 UTC
No, they're not new files:

$ hg log -vpgr 7318b2b26843 | grep "rename from"  
   1507    4521   99504

But even so, if we're only rebasing one revision, "hours" is indeed excessive.

I've simplified this to:

1) hg up 7318b2b^
2) hg import https://bugzilla.mozilla.org/attachment.cgi?id=717451
3) hg rebase -s tip -d 7318b2b^ --tool internal:local -y --debug --debugger

Notes:
- doesn't depend on mq
- doesn't trigger with trivial patch
- doesn't trigger with merge + commit
- happens during the _commit_ stage of the rebase (the merge itself takes a couple seconds)

In particular, it gets stuck here:

http://www.selenic.com/hg/file/c8c3887a24c1/mercurial/localrepo.py#l1087

The problem seems to be that:

- patch introduces ~8 new files in a directory that got renamed
- rebase calls merge
- merge renames the new file, records a copy marker
- but the copy marker points to a file in p2
- and we forget about p2 after the merge because it's a rebase
- and then we try to commit
- so commit makes a desperate attempt to find where the file was copied from
- by walking the history back from p1
- and unpacking every single manifest looking for it
- for each file moved in the rebase

..which means it ends up unpacking something like 4TB of manifest data and probably taking something like 20 days. Awesome.

This search code was introduced here:

 http://www.selenic.com/hg/annotate/0d714a48ab53/mercurial/localrepo.py#l720

to fix this bug:

 http://bz.selenic.com/show_bug.cgi?id=1175

The fix here should be for rebase to drop copy markers for copy sources that only exist in p2 before committing. But we should probably also try to bound this search too.
Comment 10 Matt Mackall 2013-02-26 22:17 UTC
Try this patch:

diff -r c8c3887a24c1 mercurial/localrepo.py
--- a/mercurial/localrepo.py	Fri Feb 22 16:40:27 2013 -0600
+++ b/mercurial/localrepo.py	Tue Feb 26 21:16:26 2013 -0600
@@ -713,6 +713,10 @@
             for f in copies:
                 if f not in pctx and copies[f] in pctx:
                     self.dirstate.copy(copies[f], f)
+            if p2 == nullid:
+                for f, s in sorted(self.dirstate.copies().items()):
+                    if f not in pctx and s not in pctx:
+                        self.dirstate.copy(None, f)
 
     def filectx(self, path, changeid=None, fileid=None):
         """changeid can be a changeset revision, node, or tag.
Comment 11 HG Bot 2013-02-28 23:00 UTC
Fixed by http://selenic.com/repo/hg/rev/5b7175377bab
Matt Mackall <mpm@selenic.com>
setparents: drop copies from dropped p2 (issue3843)

(please test the fix)