[PATCH V2] posix: insert seek between reads and writes on Solaris (issue4943)

Matt Mackall mpm at selenic.com
Mon Dec 7 16:01:05 CST 2015


On Mon, 2015-12-07 at 13:27 -0800, Pierre-Yves David wrote:
> 
> On 12/07/2015 12:24 PM, Matt Mackall wrote:
> > On Sat, 2015-12-05 at 21:26 -0800, Gregory Szorc wrote:
> > > On Fri, Dec 4, 2015 at 6:51 PM, Pierre-Yves David <
> > > pierre-yves.david at ens-lyon.org> wrote:
> > > 
> > > > So what is the state of this patch.
> > > > 
> > > > Is ther any chance that we eventually take it as is or are we
> > > > going
> > > > to use
> > > > a modified approach anyway?
> > > 
> > > 
> > > Some Solaris distributions are definitely busted. I believe mpm
> > > was
> > > investigating the scope of the bustage (OS-level POSIX
> > > semantics/bug
> > > versus
> > > Python behavior/bug).
> > 
> > As best I can tell from poring over the reports, it's a bug in the
> > stdio buffering layer in the C library and not in Python or the
> > kernel.
> > To confirm that, we'd really need to write a pure C test case to
> > mirror
> > Greg's minimal Python test.
> > 
> > The fact that we have the same bug in both Windows and (some)
> > Solaris
> > makes me pretty anxious that it's more widespread and we're opening
> > ourselves up to corruption more widely.
> > 
> > Because the buffer is a complicated read/write cache that emulates
> > the
> > file offset and the rules around writes in "a+" mode are a bit
> > subtle,
> > it seems likely to me that bugs in this code could be somewhat
> > widespread. If we look at Py3's rewrite of this code in the io
> > module,
> > it has indeed had a few bugs in this area of its own.
> > 
> > So I'm frankly leaning towards just inserting a seek in the revlog
> > code
> > to be safe.
> 
> You mean doing the seek in all cases/OSes, right. This seems
> reasonable 
> (modulo proper documentation)
> 
> What would be the downside of it? Is there an expected performance
> hit?

Unknown, but probably minimal. The seek still probably goes through the
stdio buffer code.. and just injects some sanity. Correctly-written
buffer code can potentially still elide sending the seek onto the
kernel. And even if it doesn't.. a simple syscall like this on Linux is
actually pretty similar in speed to a Python function call.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list