4943 – regression updating/cloning certain repositories

Bug 4943 - regression updating/cloning certain repositories

Summary: regression updating/cloning certain repositories

Status:	RESOLVED FIXED

Alias:	None

Product:	Mercurial
Classification:	Unclassified
Component:	Mercurial (show other bugs)
Version:	3.6
Hardware:	PC Other

Importance:	urgent bug
Assignee:	Bugzilla

URL:
Keywords:	regression

Duplicates (1):	5009 (view as bug list)
Depends on:
Blocks:

Reported:	2015-11-07 01:06 UTC by Richard PALO
Modified:	2015-12-27 00:00 UTC (History)
CC List:	7 users (show)

See Also:
Python Version:	---

Attachments
hg clone --debug (37.70 KB, application/gzip) 2015-11-07 23:37 UTC, Richard PALO	Details
Python failure repro (601 bytes, text/plain) 2015-11-15 15:44 UTC, Gregory Szorc	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard PALO 2015-11-07 01:06 UTC

I've come across what seems to be a regression with 3.6
running on SunOS 5.11.

I get integrity check failures during cloning and problems decompression
when doing a 'pull -u'.. not sure if the problems are directly related.

I don't have this problem with 3.5.2  (and have not noticed it earlier).

richard@omnis:/home/richard/src/tryton$ hg pull -uv
pulling from http://hg.tryton.org/tryton/
searching for changes
all local heads known remotely
adding changesets
adding manifests
transaction abort!
rollback completed
abandon : integrity check failed on 00manifest.i:4252 !
richard@omnis:/home/richard/src/tryton$ cd ..
richard@omnis:/home/richard/src$ rm -rf tryton
richard@omnis:/home/richard/src$ hg clone http://hg.tryton.org/tryton
destination directory: tryton
requesting all changes
adding changesets
adding manifests                                                                
transaction abort!
rollback completed
abandon : integrity check failed on 00manifest.i:154 !

I had similar results with the trytond repository on the same server, as well
as with https://hg.java.net/hg/solaris-userland~gate

Comment 1 Richard PALO 2015-11-07 01:41 UTC

BTW, I'm using pkgsrc trunk.
The following is an extract from the test results

...
Failed test-acl.t: output changed
Failed test-obsolete.t: output changed
Failed test-phases-exchange.t: output changed
Failed test-clone.t: output changed
Failed test-subrepo-git.t: output changed and returned error code 1
Failed test-largefiles-update.t: output changed
Failed test-graft.t: output changed
Failed test-tag.t: output changed
Failed test-run-tests.t: output changed
Failed test-treediscovery.t: output changed
Failed test-histedit-fold.t: output changed
Failed test-rebase-scenario-global.t: output changed
Failed test-censor.t: output changed
Failed test-treediscovery-legacy.t: output changed
Failed test-rebase-parameters.t: output changed
Failed test-hardlinks.t: output changed
Failed test-rebase-conflicts.t: output changed and returned error code 1
Failed test-rebase-detach.t: output changed
Failed test-generaldelta.t: output changed
Failed test-bundle-type.t: output changed
Failed test-pull-pull-corruption.t: output changed
# Ran 458 tests, 32 skipped, 0 warned, 21 failed.
python hash seed: 2178760656

Comment 2 Pierre-Yves David 2015-11-07 08:52 UTC

Bug mention regression, bumping to urgent.

Comment 3 Matt Mackall 2015-11-07 18:17 UTC

Can you run the clone with --debug?

Are you on x86?

Comment 4 Richard PALO 2015-11-07 23:37 UTC

Created attachment 1873 [details]
hg clone  --debug

Comment 5 Richard PALO 2015-11-07 23:38 UTC

Yes, this is on i386 (ABI=32) pkgsrc running on SunOS 5.11 (illumos)

Comment 6 Matt Mackall 2015-11-08 14:27 UTC

Can I get you to try to bisect the regression?

Comment 7 Matt Mackall 2015-11-08 14:41 UTC

Alternately, point us to a ready-to-use 32-bit VM image.

Comment 8 Richard PALO 2015-11-09 00:15 UTC

Unfortunately I don't build from the repo but with pkgsrc.

I can give you a debug traceback though:

richard@omnis:/home/richard/src/tryton$ hg pull -uv --debugger
entering debugger - type c to continue starting hg or h for help
--Call--
> /opt/local/lib/python2.7/contextlib.py(21)__exit__()
-> def __exit__(self, type, value, traceback):
(Pdb) c
pulling from http://hg.tryton.org/tryton/
searching for changes
all local heads known remotely
adding changesets
adding manifests
transaction abort!
rollback completed
Traceback (most recent call last):
  File "/opt/local/lib/python2.7/site-packages/mercurial/dispatch.py", line 187, in _runcatch
    return _dispatch(req)
  File "/opt/local/lib/python2.7/site-packages/mercurial/dispatch.py", line 920, in _dispatch
    cmdpats, cmdoptions)
  File "/opt/local/lib/python2.7/site-packages/mercurial/dispatch.py", line 679, in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/opt/local/lib/python2.7/site-packages/mercurial/dispatch.py", line 1051, in _runcommand
    return checkargs()
  File "/opt/local/lib/python2.7/site-packages/mercurial/dispatch.py", line 1011, in checkargs
    return cmdfunc()
  File "/opt/local/lib/python2.7/site-packages/mercurial/dispatch.py", line 917, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/opt/local/lib/python2.7/site-packages/mercurial/util.py", line 801, in check
    return func(*args, **kwargs)
  File "/opt/local/lib/python2.7/site-packages/mercurial/commands.py", line 5324, in pull
    opargs=pullopargs).cgresult
  File "/opt/local/lib/python2.7/site-packages/mercurial/exchange.py", line 1119, in pull
    _pullchangeset(pullop)
  File "/opt/local/lib/python2.7/site-packages/mercurial/exchange.py", line 1316, in _pullchangeset
    pullop.cgresult = cg.apply(pullop.repo, 'pull', pullop.remote.url())
  File "/opt/local/lib/python2.7/site-packages/mercurial/changegroup.py", line 372, in apply
    self._unpackmanifests(repo, revmap, trp, prog, changesets)
  File "/opt/local/lib/python2.7/site-packages/mercurial/changegroup.py", line 299, in _unpackmanifests
    repo.manifest.addgroup(self, revmap, trp)
  File "/opt/local/lib/python2.7/site-packages/mercurial/revlog.py", line 1532, in addgroup
    alwayscache=bool(addrevisioncb))
  File "/opt/local/lib/python2.7/site-packages/mercurial/revlog.py", line 1407, in _addrevision
    text = buildtext()
  File "/opt/local/lib/python2.7/site-packages/mercurial/revlog.py", line 1306, in buildtext
    basetext = self.revision(self.node(baserev), _df=fh)
  File "/opt/local/lib/python2.7/site-packages/mercurial/revlog.py", line 1117, in revision
    text = self._checkhash(text, node, rev)
  File "/opt/local/lib/python2.7/site-packages/mercurial/revlog.py", line 1132, in _checkhash
    self.checkhash(text, p1, p2, node, rev)
  File "/opt/local/lib/python2.7/site-packages/mercurial/revlog.py", line 1141, in checkhash
    % (self.indexfile, revornode))
RevlogError: integrity check failed on 00manifest.i:4252
> /opt/local/lib/python2.7/site-packages/mercurial/revlog.py(1141)checkhash()
-> % (self.indexfile, revornode))
(Pdb)

Comment 9 Danek Duvall 2015-11-09 12:20 UTC

I don't see this problem on Solaris (S12, build 86, with mercurial several changesets newer than 3.6).  It looks like disk (or possibly network) corruption.  Running on tmpfs might validate the former, and transferring a clean, correct copy in a tarball might validate the latter.

Comment 10 Matt Mackall 2015-11-09 12:56 UTC

> Unfortunately I don't build from the repo but with pkgsrc.

Unfortunately, we don't run Solaris, so we're at a bit of an impasse. Someone is going to need to do some compiling to make progress on this issue.

Comment 11 Gregory Szorc 2015-11-09 14:23 UTC

I made some significant changes to how revlog I/O works in 3.6. https://selenic.com/repo/hg/rev/39d643252b9f and https://selenic.com/repo/hg/rev/56a640b0f656 are the ones that would likely cause fallout.

However, Solaris being a UNIX/POSIX, I wouldn't expect its behavior to differ from Linux or OS X. So I'd be somewhat surprised if these patches broke Solaris. Perhaps someone with more Solaris knowledge could read the commit messages and see if it sets off any alarms.

Comment 12 Richard PALO 2015-11-09 17:01 UTC

made local in /tmp and this is what I saw:
richard@omnis:/tmp/build_/mercurial-3.6$ ./hg clone http://hg.tryton.org/tryton 
destination directory: tryton
requesting all changes
adding changesets
adding manifests                                                                
transaction abort!
rollback completed
abandon : revlog decompress error: Error -5 while decompressing data: incomplete or truncated stream !

Comment 13 Richard PALO 2015-11-10 00:38 UTC

BTW, just tried by putting back in the two flush blocks (not touching the inline stuff) and I can both clone and pull -u again.

Just to complete the information on my side, omnios native still uses python 2.6
but in pkgsrc I'm using python 2.7.

I'm curious as to why Oracle doesn't see this... I guess I can rule out zfs
since /tmp is tmpfs and I experience the same as on my zfs data volume.
Perhaps a difference in python or the underlying system call.

Comment 14 Richard PALO 2015-11-10 00:57 UTC

coming across https://www.turnkeylinux.org/blog/unix-buffering
I notice in the mercurial source that there are a number of fdopens that use
default buffering.

I'm curious - if the hg developer wants to keep the flush() out of the code, 
perhaps instead of default buffering, zero should be specified.
> os.fdopen(fd[, mode[, bufsize]])

Comment 15 Matt Mackall 2015-11-12 14:27 UTC

Richard: can I get you to actually use hg bisect to pinpoint the problematic revision? It's probably related to flush, but I'm hesitant to mark it confirmed until we're a bit more systematic about it.

Can we get more precise information about your kernel / distro / filesystem / python to try to pin down why you see the problem and Danek doesn't?

Comment 16 Richard PALO 2015-11-12 14:50 UTC

(In reply to Matt Mackall from comment #15)
As far as my system, it is OmniOS Bloody release (r151017, omnios-2a4e66a): http://omnios.omniti.com/media/OmniOS_Text_bloody_20151109.iso
which is based upon illumos (SunOS 5.11)

My primary environment is pkgsrc (which is python 2.7) but as indicated above I tested also on OmniOS native which is 2.6.

Again, as for bisecting, you will probably need to be quite specific and guide me, as I am primarily git and cvs based, but use mercurial only upon occasion to access certain repositories such as solaris userland and X, as well as tryton/trytond... 

I only build mercurial from the tarball (either pkgsrc or omnios-build).

Is there any particular reason to expect that the flush() suppressions are 
reasonable with default buffering (and not only with buffering disabled)?

My layman senses that buffering is probably a good thing and that flush()
is absolutely necessary to synchronise the files written for subsequent reads.

Comment 17 Matt Mackall 2015-11-12 16:33 UTC

Basic bisecting outline:

clone and build the hg repo:
  hg clone https://selenic.com/hg test-hg
  cd hg
  make local

start a bisect (with your system/pkgsrc hg):
  hg bisect --reset
  hg bisect --good 3.5.2
  hg bisect --bad 3.6

test the chosen revision:
  make local
  ./hg clone http://...  # use the local hg

mark the resulting test good or bad:
  hg bisect --good/bad

repeat the last two steps until it tells you what broke (~8 tests)

Comment 18 Richard PALO 2015-11-15 10:22 UTC

richard@omnis:/tmp/test-hg$ hg bisect --good
La première révision présentant le problème est :
changeset:   26427:97dc6ab42aad
parent:      26426:0486c16ce621
parent:      26419:93bfa9fc96e3
user:        Matt Mackall <mpm@selenic.com>
date:        Thu Oct 01 12:17:46 2015 -0500
summary:     merge with stable

Not all ancestors of this changeset have been checked.
Use bisect --extend to continue the bisection from
the common ancestor, c15d8f84343e.

Comment 19 Richard PALO 2015-11-15 10:54 UTC

sorry for forgetting to set LANG=C first..

the first extend is as follows:
richard@omnis:/tmp/test-hg$ hg bisect --bad
The first bad revision is:
changeset:   26409:19d946cf2a1b
parent:      26408:2b31d8636f25
parent:      26405:c15d8f84343e
user:        Matt Mackall <mpm@selenic.com>
date:        Tue Sep 29 14:33:31 2015 -0500
summary:     merge with stable

Not all ancestors of this changeset have been checked.
Use bisect --extend to continue the bisection from
the common ancestor, f31ddc9bfa5f.


the second extend was pitifully slow, but I only bisected '--bad' with
a failure:
richard@omnis:/tmp/test-hg$ hg bisect --good
The first bad revision is:
changeset:   26379:39d643252b9f
user:        Gregory Szorc <gregory.szorc@gmail.com>
date:        Sun Sep 27 16:08:18 2015 -0700
summary:     revlog: use existing file handle when reading during _addrevision

that seems to be it.

Comment 20 Richard PALO 2015-11-15 11:00 UTC

the first hunk is:
richard@omnis:/tmp/test-hg$ hg diff -c 26379:39d643252b9f
diff -r e749707f0afb -r 39d643252b9f mercurial/revlog.py
--- a/mercurial/revlog.py	Sun Sep 27 15:59:19 2015 -0700
+++ b/mercurial/revlog.py	Sun Sep 27 16:08:18 2015 -0700
@@ -1289,10 +1289,6 @@
         def buildtext():
             if btext[0] is not None:
                 return btext[0]
-            # flush any pending writes here so we can read it in revision
-            if dfh:
-                dfh.flush()
-            ifh.flush()
             baserev = cachedelta[0]
             delta = cachedelta[1]
             # special case deltas which replace entire base; no need to decode

Comment 21 Gregory Szorc 2015-11-15 11:53 UTC

I'm going to stand up a VM and look into this.

I'm leaning towards an OS or CPython bug with a potential workaround in our VFS layer so we don't regress performance on platforms without the bug.

Comment 22 Gregory Szorc 2015-11-15 13:30 UTC

I was unable to get pkgsrc bootstrapped and installed on OmniOS. However, I did install Joyent's binary package set following the 64-bit instructions at https://pkgsrc.joyent.com/install-on-illumos/ and I was able to reproduce this failure by cloning https://selenic.com/repo/hg with 3.6.1.

Now begins the investigation of root cause.

Comment 23 Richard PALO 2015-11-15 13:37 UTC

BTW, I just tested on SunOS 5.11 oi_151a9 with the same results as current OmniOS.

Comment 24 Gregory Szorc 2015-11-15 13:38 UTC

I built Python 2.7.10 from source and reproduced the bug. This should rule out a patch to patch to CPython from the packaging system introducing the bug.

Comment 25 Richard PALO 2015-11-15 13:40 UTC

(In reply to Gregory Szorc from comment #22)
you don't need to install pkgsrc as the following will install OmniOS native:
# pkg install developer/versioning/mercurial

you can then hg clone https://selenic.com/hg test-hg to do the same.

(actually, I used native when I did my tests -)

Comment 26 Gregory Szorc 2015-11-15 14:31 UTC

CPython buffered file I/O is implemented by calling setvbuf() or setbuf(): https://hg.python.org/cpython/file/017d77d001c5/Objects/fileobject.c#l509

Things are pointing to an issue calling read() to obtain data that hasn't been flushed past the fd buffer. It's also possible the seek isn't interacting well with the buffered file descriptor. Or we need to seek to EOF before writing (possible difference in how O_APPEND works on Solaris).

As a refresher, our system calls for writing revlogs on Linux now look like:

# Open revlog for read+write
fd = open(revlog, O_RDWR|O_CREAT|O_APPEND, 0666)
# (there is a setvbuf() here)

# Write new revlog data in chunks of 4k.
write(fd, <4k>)
write(fd, <4k>)
...

# Read full text to start new delta chain. Needs to read revlog chunks.
lseek(fd, off, SEEK_SET)
read(fd, <chunk size>)
read(fd, <chunk size>)
...

# Write new revlog data
write(fd, <4k>)
write(fd, <4k>)
...

Before 39d643252b9f, we opened the fd in append only mode and performed a fflush(fd) between the write and read operations.

I'll keep poking around.

Comment 27 Gregory Szorc 2015-11-15 14:55 UTC

System calls between Linux and Solaris are mostly aligning.

Linux:

open("/home/gps/tmp/repo/.hg/store/00manifest.i", O_RDWR|O_CREAT|O_APPEND, 0666) = 6
fstat(6, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
write(4, "00manifest.i\0000\n", 15)     = 15
fstat(6, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb2e72fd000
write(6, "\0\1\0\1\0\0\0\0\0\0\1\305\0\0\2\327\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377"..., 2703) = 2703
lseek(6, 0, SEEK_SET)                   = 0
read(6, "\0\1\0\1\0\0\0\0\0\0\1\305\0\0\2\327\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377"..., 65536) = 2703
read(6, "", 61440)                      = 0
write(6, "\0\0\0\0\6O\0\0\0\0\2\r\0\0\3?\0\0\0\21\0\0\0\21\0\0\0\20\377\377\377\377"..., 3333) = 3333
lseek(6, 0, SEEK_SET)                   = 0
read(6, "\0\1\0\1\0\0\0\0\0\0\1\305\0\0\2\327\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377"..., 65536) = 6036
read(6, "", 57344)                      = 0
write(6, "\0\0\0\0\16\24\0\0\0\0\2w\0\0\3\361\0\0\0&\0\0\0&\0\0\0%\377\377\377\377"..., 3085) = 3085
lseek(6, 0, SEEK_SET)                   = 0
read(6, "\0\1\0\1\0\0\0\0\0\0\1\305\0\0\2\327\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377"..., 65536) = 9121
read(6, "", 53248)                      = 0


Solaris:

open("/root/tmp/repo/.hg/store/00manifest.i", O_RDWR|O_APPEND|O_CREAT, 0666) = 6
fstat(6, 0xFFFFFD7FFFDFD0E0)                    = 0
write(4, " 0 0 m a n i f e s t . i".., 15)      = 15
fstat(6, 0xFFFFFD7FFFDFD860)                    = 0
fstat(6, 0xFFFFFD7FFFDFD790)                    = 0
ioctl(6, TCGETA, 0xFFFFFD7FFFDFD810)            Err#25 ENOTTY
write(6, "\001\001\0\0\0\0\0\001C5".., 2703)    = 2703
lseek(6, 0, SEEK_SET)                           = 0
read(6, "\001\001\0\0\0\0\0\001C5".., 131072)   = 2703
read(6, 0x00E22814, 131072)                     = 0
lseek(6, 0, SEEK_SET)                           = 0
read(6, "\001\001\0\0\0\0\0\001C5".., 131072)   = 2703
read(6, 0x00E22814, 131072)                     = 0
lseek(6, 0, SEEK_CUR)                           = 2703
close(6)                                        = 0

What's weird on Solaris is that read() is properly returning 2703 bytes yet we immediately try to read again. Time to go into a debugger...

Comment 28 Gregory Szorc 2015-11-15 15:34 UTC

Now I'm getting somewhere.

Stepping through a Python debugger when writing 00manifest.i, I see some funky behavior.

We're able to write() the low-level data to the buffered revlog. On disk, the file is 0 sized until _loadchunk() (called when creating a new delta chain) seeks the file to offset 0 (https://selenic.com/repo/hg/file/2da6a2dbfc42/mercurial/revlog.py#l961), at which point the size on disk is reported as 2703 bytes.

We are able to read the first 2703 bytes just fine. So far so good.

Sometime after we seek() and read(), we go to write() more entries to the manifest revlog. Before we write anything, tell() says we're at offset 2703. That's expected. However, after we make our first write() (64 bytes), tell() reports an offset of -3377! Wat. After that, things are really messed up. The next time we go to seek() and read(), the file size stays at 2703 bytes and the newly written data is nowhere to be found!

Calling seek(0, SEEK_END) between the read() and write() seems to make the problem go away.

Comment 29 Gregory Szorc 2015-11-15 15:44 UTC

Created attachment 1877 [details]
Python failure repro

This python program demonstrates the bug on Solaris.

Output on Solaris:

opened testio in a+
wrote 5 bytes
seeked to beginning
offset: 0
read: 5
offset: 5
wrote 5 bytes
offset: -131062
seeked to beginning
offset: 0
read: 5
offset: 5

Comment 30 Gregory Szorc 2015-11-15 15:56 UTC

It's worth noting that Windows and Solaris appear to have the same behavior w.r.t. requiring a seek() between read() and write() calls. We already have a workaround for Windows:

changeset:   26375:3686fa2b8eee
user:        Gregory Szorc <gregory.szorc@gmail.com>
date:        Sun Sep 27 18:46:53 2015 -0700
summary:     windows: insert file positioning call between reads and writes

Comment 31 Danek Duvall 2015-11-15 16:27 UTC

I'm still not seeing it on Solaris, either 11.2, 11.3 or S12b86.  I ran it on files on ZFS, tmpfs, and NFS, and with Python 2.6 (32-bit), 2.7 (32 and 64-bit), and 3.4 (64-bit).  All do what you'd expect.

It's possible that Oracle fixed some bug since the closure of OpenSolaris, or that Illumos introduced one.  If it's the former, I don't know when; finding machines running older bits is difficult.  Besides, Oracle doesn't support anything older than 11.2, anyway.

Comment 32 Gregory Szorc 2015-11-15 16:42 UTC

This bug is not present in Python 3. I tested all the way down to version 3.0.1.

It looks like Python 3 is inserting seeking automatically since system call tracing reveals extra lseek() calls. I'm having a difficult time finding the C code where it actually does this since they refactored the code significantly and I'm not familiar with the 3.5 source. It also looks like they removed the setbuf()/setvbuf() calls. bufferedio.c seems to indicate they implemented their own buffering independent of the C stdlib. We might want to hit up a CPython contributor for the history lesson.

Anyway, I'll send a patch to make the Windows VFS layer seek workaround active on Solaris. This will result in more system calls on Solaris. Fortunately it shouldn't have a significant performance impact since the removal of these calls did not appear to significantly impact performance on POSIX native platforms.

Comment 33 Richard PALO 2015-11-15 16:50 UTC

(In reply to Gregory Szorc from comment #29)
hmm,

if I change:

fd = open(path, 'a+b')

to 

fd = open(path, 'a+b', 0)

things seem okay.

You should be able to do similar with setting explicitly O_SYNC on open or just after open.

Or simply use flush() to take advantage of the default buffering.

Comment 34 Gregory Szorc 2015-11-15 17:19 UTC

Buffered I/O is a good thing because it avoids potentially expensive overhead. If we were to remove buffered I/O, we'd likely increase system calls by at least 1 magnitude when cloning/pulling/unbundling.

Comment 35 Bugzilla 2015-11-26 00:00 UTC

Bug marked urgent for 10 days, bumping

Comment 36 Richard PALO 2015-12-06 00:47 UTC

(In reply to Gregory Szorc from comment #34)
>Anyway, I'll send a patch to make the Windows VFS layer seek workaround active >on Solaris. This will result in more system calls on Solaris. Fortunately it >shouldn't have a significant performance impact since the removal of these >calls did not appear to significantly impact performance on POSIX native >platforms.

Can you confirm this has landed in 3.6.1, I have the same error as in 3.6

Comment 37 Pierre-Yves David 2015-12-06 00:50 UTC

This as not landed yet. We are investigating the extend of the issue in other Unix.

Comment 38 Pierre-Yves David 2015-12-07 16:51 UTC

latest status on this (from the mailing list) https://selenic.com/pipermail/mercurial-devel/2015-December/076589.html

Comment 39 Gregory Szorc 2015-12-15 20:06 UTC

*** Bug 5009 has been marked as a duplicate of this bug. ***

Comment 40 HG Bot 2015-12-18 15:45 UTC

Fixed by https://selenic.com/repo/hg/rev/e240e914d226
Gregory Szorc <gregory.szorc@gmail.com>
revlog: seek to end of file before writing (issue4943)

Revlogs were recently refactored to open file handles in "a+" and use a
persistent file handle for reading and writing. This drastically
reduced the number of file handles being opened.

Unfortunately, it appears that some versions of Solaris lose the file
offset when performing a write after the handle has been seeked.

The simplest workaround is to seek to EOF on files opened in a+ mode
before writing to them, which is what this patch does.

Ideally, this code would exist in the vfs layer. However, this would
require creating a proxy class for file objects in order to provide a
custom implementation of write(). This would add overhead. Since
revlogs are the only files we open in a+ mode, the one-off workaround
in revlog.py should be sufficient.

This patch appears to have little to no impact on performance on my
Linux machine.

(please test the fix)

Comment 41 Richard PALO 2015-12-19 00:20 UTC

This patch seems to get over the issue... (I applied it to 3.6.2)

I do notice a difference in performance, though.

Roughly speaking there seems to be on an average of 25-50% more system time,
(compared with 3.5.2) though I haven't actually profiled the differences.

Comment 42 Bugzilla 2015-12-27 00:00 UTC

Bug was set to TESTING for 7 days, resolving