Bug 5658 - I/O error writing to stdout/stderr can lead to abandoned transactions
Summary: I/O error writing to stdout/stderr can lead to abandoned transactions
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: 4.2.2
Hardware: All All
: urgent bug
Assignee: Bugzilla
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2017-08-14 14:51 UTC by Gregory Szorc
Modified: 2017-10-17 00:00 UTC (History)
3 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gregory Szorc 2017-08-14 14:51 UTC
Upon upgrading various servers at Mozilla from 4.1.2 to 4.2.2, we started seeing a number of abandoned transactions on the server. This appears to occur with both SSH and HTTP servers.

I believe e9646ff34d55 and 1bfb9a63b98e are to blame.

Before those changesets, various IOError during ui._write_err() and ui.flush() were ignored. After those changesets, they get converted into exceptions.

What I think is happening is something like this:

1) Client connects
2) Transaction opened on server
3) Client disconnects via ^C
4) Server-side process keeps going or triggers a KeyboardInterrupt, which causes the transaction to unwind
5) `self.report(_("transaction abort!\n"))` from transaction._abort (or any other write or flush on ui) attempts to write to a pipe that has disappeared, triggering IOError.
6) Post e9646ff34d55 and 1bfb9a63b98e, the IOError/StdioError is unhandled by transaction.py, leading to an orphaned journal file and abandonded transaction errors

Essentially, the underlying bug is that IOError on stdout and stderr interfere with transaction semantics.
Comment 1 Bugzilla 2017-08-25 00:00 UTC
Bug marked urgent for 10 days, bumping
Comment 2 Yuya Nishihara 2017-08-25 09:12 UTC
Fixed by cde4cfeb6e3e "ui: restore behavior to ignore some I/O errors
(issue5658)"
Comment 3 Bugzilla 2017-09-02 00:00 UTC
Bug was set to TESTING for 7 days, resolving
Comment 4 HG Bot 2017-10-04 18:07 UTC
Fixed by https://mercurial-scm.org/repo/hg/rev/2debf1e3cfa4
Gregory Szorc <gregory.szorc@gmail.com>
tests: test behavior of IOError during transactions (issue5658)

ui._write(), ui._write_err(), and ui.flush() all trap IOError and
re-raise as error.StdioError. If a caller doesn't catch StdioError
when writing to stdio, it could bubble all the way to dispatch.

This commit adds tests for I/O failures around various transaction
operations.

The most notable badness is during abort. Here, an uncaught StdioError
will result in incomplete transaction rollback, requiring an
`hg rollback` to recover. This can result in a client "corrupting"
a remote repo via terminated HTTP and SSH socket.

(please test the fix)
Comment 5 Bugzilla 2017-10-17 00:00 UTC
Bug was set to TESTING for 12 days, resolving