3402 – HgWeb.cgi Hanging During Push

Bug 3402 - HgWeb.cgi Hanging During Push

Summary: HgWeb.cgi Hanging During Push

Status:	RESOLVED ARCHIVED

Alias:	None

Product:	Mercurial
Classification:	Unclassified
Component:	hgweb (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	normal bug
Assignee:	Bugzilla

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-04-25 16:52 UTC by Aaron Jensen
Modified:	2014-07-31 13:22 UTC (History)
CC List:	4 users (show)

See Also:
Python Version:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Aaron Jensen 2012-04-25 16:52 UTC

We’re running Mercurial 2.1.1 under IIS using CGI on Windows 2008.  We have 
two server-side 
hooks (written in PowerShell) that run on pretxnchangegroup that can take up 
to a minute to 
run.  

We’re noticing that if developer #2 pushes while developer #1 is pushing 
(his python.exe CGI 
process has locked the repo and our hooks are running), as expected, 
developer #2’s CGI process 
sits and waits for developer #1’s push to finish.  However, once developer 
#1’s push succeeds, 
developer #2’s CGI process doesn’t detect that the repo is 
available/unlocked, and never locks 
the repo or runs any hooks.  It just hangs, using no CPU or increasing in 
memory.

I would expect that developer #2 would get a message about “waiting for 
lock” message, but the 
last message Mercurial outputs is “searching for changes”.  Hitting CTRL+C 
doesn’t stop the 
push.  Developer #2 has to kill hg.exe, or I have to log into our Mercurial 
server and kill 
developer #2’s CGI process.  No repository corruption occurs on either the 
client or the 
server.

# Steps to Reproduce

On the server:

> hg init push-hangs
> cd push-hangs
> echo '[hooks]' > .hg\hgrc
> echo 'pretxnchangegroup.sleep = echo. | powershell -NoProfile -Command 
"Start-Sleep -Seconds 
10"' >> .hg\hgrc

On the client:
> hg clone http://server/push-hangs
no changes found
updating to branch default
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
> hg clone http://server/push-hangs push-hangs2
no changes found
updating to branch default
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
> cd push-hangs
> echo '' > a.txt
> hg add a.txt
> hg commit -m "Adding file."
> cd ..\push-hangs2
> echo '' > b.txt
> hg add b.txt
> hg commit -m "Adding file."
> hg push

While that is pushing, within ten seconds, open a new console:
> cd push-hangs
> hg push

Notice that when the first push finishes, the second hangs and never 
finishes.

Comment 1 Aaron Jensen 2012-04-25 18:04 UTC

I hacked hgweb.cgi to have it output stderr to a file so I could see what's going on:

     import os
     errlog = "C:/inetpub/logs/httperr.%d.log" % os.getpid()
     sys.stderr = open(errlog, "w")
     sys.stderr.write("Writing to standard error.\n")
     sys.stderr.flush()

Per suggestion from mpm on the users mailing list, I added a bunch of debugging statements to wireproto.py:

    try:
        proto.getfile(fp)
        sys.stderr.write("%d: at step 1\n" % os.getpid()); sys.stderr.flush()
        lock = repo.lock()
        sys.stderr.write("%d: at step 2\n" % os.getpid()); sys.stderr.flush()
        try:
            if not check_heads():
                sys.stderr.write("%d: at step 3\n" % os.getpid()); sys.stderr.flush()
                # someone else committed/pushed/unbundled while we
                # were transferring data
                return pusherr('unsynced changes')

            # push can proceed
            sys.stderr.write("%d: at step 4\n" % os.getpid()); sys.stderr.flush()
            fp.seek(0)
            sys.stderr.write("%d: at step 5\n" % os.getpid()); sys.stderr.flush()
            gen = changegroupmod.readbundle(fp, None)

            try:
                sys.stderr.write("%d: at step 6\n" % os.getpid()); sys.stderr.flush()
                r = repo.addchangegroup(gen, 'serve', proto._client())
            except util.Abort, inst:
                sys.stderr.write("abort: %s\n" % inst); sys.stderr.flush()
        finally:
            sys.stderr.write("%d: at step 7\n" % os.getpid()); sys.stderr.flush()
            lock.release()
        sys.stderr.write("%d: at step 8\n" % os.getpid()); sys.stderr.flush()
        return pushres(r)

And this is the output from the hung process: 

9876: at step 1
9876: at step 2
9876: at step 3
9876: at step 7

It looks like its hanging on the call to lock.release().  Should it even be getting the lock in the first 
place?

Comment 2 Bugzilla 2012-05-12 09:30 UTC


--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:30 EDT  ---

This bug was previously known as _bug_ 3401 at http://mercurial.selenic.com/bts/issue3401

Comment 3 Matt Mackall 2014-07-25 17:22 UTC

Bulk close: no activity for >2 years -> WONTFIX

Comment 4 Matt Mackall 2014-07-31 13:22 UTC

Bulk change recent WONTFIX -> new, more descriptive ARCHIVED state (sorry for the spam)