Bug 4336 - stderr from ssh not flushed after command
Summary: stderr from ssh not flushed after command
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: 3.1
Hardware: PC Linux
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-13 23:48 UTC by Gregory Szorc
Modified: 2014-09-04 13:48 UTC (History)
2 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gregory Szorc 2014-08-13 23:48 UTC
I have an extension that's piggybacking on top of `hg push`. It invokes a separate wire protocol command (pushreview) after the unbundle, listkey, and pushkey commands that constitute a built-in push. As part of this command, it does some ui.write()ing.

What I'm seeing is out-of-order printing of messages between OS X and Linux. OS X does the right thing. However, on Linux, I'm seeing the stderr output (the "remote:" prefixed messages sent in return from the unbundle command) printed *after* my command output.

On OS X:

$ hg push --debug
  pushing to ssh://user@dummy/$TESTTMP/server
  running python "/Users/gps/src/hgcustom/version-control-tools/pylib/mercurial-support/dummyssh" user@dummy 'hg -R $TESTTMP/server serve --stdio'
  sending hello command
  sending between command
  remote: 169
  remote: capabilities: lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 reviewboard pullreviews
  remote: 1
  query 1; heads
  sending batch command
  searching for changes
  all remote heads known locally
  preparing listkeys for "phases"
  sending listkeys command
  2 changesets found
  list of changesets:
  775aa9c61e2bacff3f20f2e7fef54062038c8328
  c90aa97490e2048e34287d62f4a98f47acea4bac
  sending unbundle command
  bundling: 1/2 changesets (50.00%)
  bundling: 2/2 changesets (100.00%)
  bundling: 1/2 manifests (50.00%)
  bundling: 2/2 manifests (100.00%)
  bundling: foo1 1/2 files (50.00%)
  bundling: foo2 2/2 files (100.00%)
  remote: adding changesets
  remote: adding manifests
  remote: adding file changes
  remote: added 2 changesets with 2 changes to 2 files
  preparing listkeys for "phases"
  sending listkeys command
  preparing pushkey for "phases:274cf95ba54c5a10b258b1d05a1be63e312e3573"
  sending pushkey command
  try to push obsolete markers to remote
  checking for updated bookmarks
  preparing listkeys for "bookmarks"
  sending listkeys command
  submitting 2 changesets for review
  sending pushreview command

  changeset:  1:775aa9c61e2b
  summary:    Bug 123 - Foo 1
  review:     http://localhost:$HGPORT1/r/2 (pending)

  changeset:  2:c90aa97490e2
  summary:    Bug 123 - Foo 2
  review:     http://localhost:$HGPORT1/r/3 (pending)

  review id:  bz://123/mynick
  review url: http://localhost:$HGPORT1/r/1 (pending)

However on OS X I get the "remote:" prefixed lines later on:

$ hg push --debug
  pushing to ssh://user@dummy/$TESTTMP/server
  running python "/version-control-tools/pylib/mercurial-support/dummyssh" user@dummy 'hg -R $TESTTMP/server serve --stdio'
  sending hello command
  sending between command
  remote: 169
  remote: capabilities: lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 reviewboard pullreviews
  remote: 1
  query 1; heads
  sending batch command
  searching for changes
  all remote heads known locally
  2 changesets found
  list of changesets:
  775aa9c61e2bacff3f20f2e7fef54062038c8328
  c90aa97490e2048e34287d62f4a98f47acea4bac
  sending unbundle command
  bundling: 1/2 changesets (50.00%)
  bundling: 2/2 changesets (100.00%)
  bundling: 1/2 manifests (50.00%)
  bundling: 2/2 manifests (100.00%)
  bundling: foo1 1/2 files (50.00%)
  bundling: foo2 2/2 files (100.00%)
  preparing listkeys for "phases"
  sending listkeys command
  preparing pushkey for "phases:274cf95ba54c5a10b258b1d05a1be63e312e3573"
  sending pushkey command
  try to push obsolete markers to remote
  checking for updated bookmarks
  preparing listkeys for "bookmarks"
  sending listkeys command
  submitting 2 changesets for review
  sending pushreview command

  changeset:  1:775aa9c61e2b
  summary:    Bug 123 - Foo 1
  review:     http://localhost:$HGPORT1/r/2 (pending)

  changeset:  2:c90aa97490e2
  summary:    Bug 123 - Foo 2
  review:     http://localhost:$HGPORT1/r/3 (pending)

  review id:  bz://123/mynick
  review url: http://localhost:$HGPORT1/r/1 (pending)
  remote: adding changesets
  remote: adding manifests
  remote: adding file changes
  remote: added 2 changesets with 2 changes to 2 files

It seems like the SSH process's stderr stream isn't being flushed or read after each command.

When I insert breakpoints, I discover that the lines from unbundle ("adding changesets" etc) aren't read until sshpeer.cleanup() is called as part of sshpeer.__del__(). This seems to reliably occur after commands.push() has completed.

I've tried inserting remote.readerr() throughout sshpeer and the push code to try to get these lines to display. But they never show up.

I've tried inserting flushes of stderr in sshserver. But again, the data never seems to get to the other side of the pipe.

I thought that maybe the pipe was buffering things. I changed the implementation of util.Popen4() so bufsize=0. No difference.

I thought that maybe my dummyssh Python script was buffering. But stderr is unbuffered in Python.

At this point, I /think/ I've tried all the normal approaches to debugging I/O problems like this and have nothing to show for it. I'm baffled as to why OS X and Linux have different behavior. Usually their POSIX I/O behaviors are very similar. Perhaps this is a CPython oddity. I'm running Python 2.7.6 on Ubuntu 14.04 and 2.7.8 on OS X 10.9.4.

I'll keep trying to find the golden combination of flushes and polling to figure this out...
Comment 1 Gregory Szorc 2014-08-14 01:30 UTC
I pushed over a real SSH server and can confirm that dummyssh is not the problem.
Comment 2 Gregory Szorc 2014-08-15 22:39 UTC
The bug is in sshpeer.readerr()

    def readerr(self):
        while True:
            size = util.fstat(self.pipee).st_size
            if size == 0:
                break
            s = self.pipee.read(size)
            if not s:
                break
            for l in s.splitlines():
                self.ui.status(_("remote: "), l, '\n')

util.fstat() is calling os.fstat(fd).

I can't find it explicitly mentioned in any man pages, but apparently st_size for pipes on Linux is always 0.

The good news is that if you .read() from the pipe, data is available.

I think the fix is making the pipe non-blocking and reading all available data. Time to test some patches...
Comment 3 HG Bot 2014-08-20 02:30 UTC
Fixed by http://selenic.com/repo/hg/rev/331cbf088c4c
Gregory Szorc <gregory.szorc@gmail.com>
posix: implement readpipe using non-blocking I/O (issue4336)

On Linux, fstat().st_size of a pipe always returns 0, even if the
pipe has data available for reading. This meant that reading from
and subsequently printing the stderr pipe content after wireproto
commands over SSH meant that available data wasn't being printed.

We now implement pipe reading on POSIX by doing a non-blocking
read for all available data.

(please test the fix)
Comment 4 HG Bot 2014-08-20 02:30 UTC
Fixed by http://selenic.com/repo/hg/rev/8341c677c204
Gregory Szorc <gregory.szorc@gmail.com>
test-ssh: verify that stderr from remote is printed (issue4336)

The issue fixed in the previous patch was uncovered by implementing an
extension that printed additional output locally before the push command
completed. This test emulates that.

If this change is applied before the previous patch, the test will fail
on Linux, with the local output being printed before the "remote: "
lines.

(please test the fix)