A possible explanation for random "stream ended unexpectedly (got m bytes, expected n)"

Jun Wu quark at fb.com
Sat Mar 25 21:24:26 EDT 2017


May I ask what version of Python are you using? If it's < 2.7.4, the EINTR
issue is expected.

Excerpts from Mike Hommey's message of 2017-03-26 09:08:25 +0900:
> On Sat, Mar 25, 2017 at 10:34:02AM -0700, Gregory Szorc wrote:
> > On Sat, Mar 25, 2017 at 4:19 AM, Mike Hommey <mh at glandium.org> wrote:
> > 
> > > Hi,
> > >
> > > I don't know about you, but occasionally, I've hit "stream ended
> > > unexpectedly (got m bytes, expected n)" errors that didn't make sense.
> > > Retrying would always work.
> > >
> > > Recently, I was trying to use signal.setitimer and a signal handler for
> > > some memory profiling on git-cinnabar, which uses mercurial as a
> > > library, and got "stream ended 4 unexpectedly (got m bytes, expected n)"
> > > *very* reproducibly. Like, with an interval timer firing every second,
> > > it took only a few seconds to hit the error during a clone.
> > >
> > > I'm pretty sure this can be reproduced with a similar setup in mercurial
> > > itself.
> > >
> > > Now, the reason this happens in this case is that, the code that fails
> > > does:
> > >
> > > def readexactly(stream, n):
> > >     '''read n bytes from stream.read and abort if less was available'''
> > >     s = stream.read(n)
> > >     if len(s) < n:
> > >         raise error.Abort(_("stream ended unexpectedly"
> > >                            " (got %d bytes, expected %d)")
> > >                           % (len(s), n))
> > >     return s
> > >
> > > ... and thanks to POSIX, interrupted reads can lead to short reads. So,
> > > you request n bytes, and get less, just because something interrupted
> > > the process.  The problem then is that python doesn't let you know why
> > > you just got a short read, and you have to figure that out on your own.
> > >
> > > The same kind of problem is also true to some extent on writes.
> > >
> > > Now, the problem is that this piece of code is likely the most visible
> > > place where the issue exists, but there are many other places in the
> > > mercurial code base that are likely affected.
> > >
> > > And while the signal.setitimer case is a corner case (and I ended up
> > > using a separate thread to work around the problem ; my code wasn't
> > > interruption safe either anyways), I wonder if those random "stream
> > > ended unexpectedly (got m bytes, expected n)" errors I was getting under
> > > normal circumstances are not just a manifestation of the same underlying
> > > issue, which is that the code doesn't like interrupted reads.
> > >
> > > Disclaimer: I'm not going to work on fixing this issue, but I figured
> > > I'd let you know, in case someone wants to look into it more deeply.
> > >
> > 
> > Thank you for writing this up. This "stream ended unexpectedly" has been a
> > thorn in my side for a while, as it comes up frequently in Mozilla's CI
> > with a frequency somewhere between 1 in 100-1000. Even retrying failed
> > operations multiple times isn't enough to overcome it
> > 
> > I have long suspected interrupted system calls as a likely culprit.
> > However, when I initially investigated this a few months ago, I found that
> > Python's I/O APIs retry automatically for EINTR. See
> > https://hg.python.org/cpython/file/54c93e0fe79b/Lib/socket.py#l365  for
> > example. This /should/ make e.g. socket._fileobject.read() resilient
> > against signal interruption. (If Python's I/O APIs didn't do this, tons of
> > programs would break. Also, the semantics of .read() are such that it is
> > always supposed to retrieve all available bytes until EOF - at least for
> > some io ABCs. read1() exists to perform at most 1 system call.)
> 
> Note that EINTR is not the only way read() can end from interruption:
> 
>    If a read() is interrupted by a signal before it reads any data, it
>    shall return -1 with errno set to [EINTR].
> 
>    If a read() is interrupted by a signal after it has successfully read
>    some data, it shall return the number of bytes read.
> 
>    From http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html 
> 
> But that's POSIX, Windows is another story. Recv is different too.
> 
> On Sat, Mar 25, 2017 at 12:00:42PM -0700, Gregory Szorc wrote:
> > Can you please provide more detailed steps to reproduce?
> > 
> > I added the following code at the top of exchange.pull:
> > 
> >     def sighandler(sig, stack):
> >         pass
> > 
> >     import signal
> >     signal.signal(signal.SIGALRM, sighandler)
> >     signal.setitimer(signal.ITIMER_REAL, 1.0, 1.0)
> > 
> > However, I was unable to reproduce the "stream ended unexpectedly" failure
> > when cloning a Firefox repo from hg.mozilla.org. And I even tried with the
> > interval set to 1ms.
> 
> So, I tried to reproduce again with my original testcase, and failed to.
> In fact, instead I was getting urllib2.URLError: <urlopen error [Errno 4]
> Interrupted system call> errors. Delaying the initial SIGALRM to go past
> the start of the HTTP request didn't fail either. Even small intervals.
> 
> Then I tried on another machine and it happened reliably... and I
> figured it was because I had a config difference between the two
> machines, where in the latter case git-cinnabar would not use mercurial
> to talk to the server (but it would still use it to read the bundle2,
> and fail in mercurial.changegroup.readexactly).
> 
> The main difference is that in the latter case, the underlying
> "connection" is a pipe instead of a socket. This clearly influences the
> behavior on interruption from a signal. It's actually rather easy to get
> short reads with pipes. Short reads from recv are probably more
> difficult to reproduce.
> 
> Mike


More information about the Mercurial-devel mailing list