Bug 4546 - Commits that add new files fail when written to a windows network share.
Summary: Commits that add new files fail when written to a windows network share.
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: 3.3
Hardware: PC Windows
: critical bug
Assignee: Matt Mackall
URL:
Keywords:
: 4533 4545 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-02-20 07:02 UTC by Deanna Earley
Modified: 2015-03-31 13:42 UTC (History)
10 users (show)

See Also:
Python Version: ---


Attachments
Process monitor export of the failure (725.87 KB, application/vnd.ms-excel)
2015-02-20 07:06 UTC, Deanna Earley
Details
Screenshot of Explorer failing (59.75 KB, image/png)
2015-02-20 11:12 UTC, Deanna Earley
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Deanna Earley 2015-02-20 07:02 UTC
Whenever I commit, or push any changes that add new files to a repository over a windows network share, the commit fails with:
IOError: [Errno 2] No such file or directory: 'P:\\repotest\\.hg/store\\journal.
backup.fncache'

This is reproducible with a freshly created repository:

P:\repotest>hg init

P:\repotest>hg add file1.txt

P:\repotest>hg commit -m commit1

P:\repotest>hg add file2.txt

P:\repotest>hg commit -m commit2
Traceback (most recent call last):
  File "mercurial\dispatch.pyo", line 140, in _runcatch
  File "mercurial\dispatch.pyo", line 860, in _dispatch
  File "mercurial\dispatch.pyo", line 621, in runcommand
  File "mercurial\dispatch.pyo", line 951, in _runcommand
  File "mercurial\dispatch.pyo", line 922, in checkargs
  File "mercurial\dispatch.pyo", line 857, in <lambda>
  File "mercurial\util.pyo", line 711, in check
  File "mercurial\extensions.pyo", line 149, in wrap
  File "mercurial\util.pyo", line 711, in check
  File "hgext\mq.pyo", line 3483, in mqcommand
  File "mercurial\util.pyo", line 711, in check
  File "mercurial\commands.pyo", line 1505, in commit
  File "mercurial\cmdutil.pyo", line 2212, in commit
  File "mercurial\commands.pyo", line 1499, in commitfunc
  File "hgext\largefiles\reposetup.pyo", line 267, in commit
  File "hgext\mq.pyo", line 3386, in commit
  File "mercurial\localrepo.pyo", line 64, in wrapper
  File "mercurial\localrepo.pyo", line 1397, in commit
  File "hgext\largefiles\reposetup.pyo", line 247, in commitctx
  File "mercurial\localrepo.pyo", line 64, in wrapper
  File "mercurial\localrepo.pyo", line 1499, in commitctx
  File "mercurial\transaction.pyo", line 25, in _active
  File "mercurial\transaction.pyo", line 408, in close
  File "mercurial\transaction.pyo", line 465, in _writeundo
  File "mercurial\util.pyo", line 733, in copyfile
  File "shutil.pyo", line 82, in copyfile
IOError: [Errno 2] No such file or directory: 'P:\\repotest\\.hg/store\\journal.
backup.fncache'
abort: No such file or directory: P:\repotest\.hg/store\journal.backup.fncache
transaction abort!
failed to recover fncache
rollback completed

P:\repotest>

When I look in Explorer, that file exists and has a date before the failure (presumably, it's a copied file)

If I do the same process on a local file system, it will work, and then trigger the same issue when I push to a network share.

The network admin says nothing obvious has changed in the network structure or management.

This occurs on Windows 7 with Mercurial 3.3
Comment 1 Deanna Earley 2015-02-20 07:06 UTC
Created attachment 1811 [details]
Process monitor export of the failure

Attached is a Process Monitor log of the failure.
The raw log is available at https://dl.dropboxusercontent.com/u/2931731/Bugs/mercurial/4546/procmon.PML
Comment 2 bloops 2015-02-20 10:15 UTC
Same problem here. Even worse, after trying to push to the repository located on a network share, that remote repository is corrupted (despite the message "rollback completed").

After running hg verify, there is a "missing revlog!" error for (nearly?) each file with the following result:

241 files, 180 changesets, 859 total revisions
421 integrity errors encountered!
(first damaged changeset appears to be 0)

Please note: hg verify reports no errors for the local repository and no errors for the remote repository before the "push".
Comment 3 Deanna Earley 2015-02-20 11:04 UTC
I've just tried disabling all extensions and the issue remains:
Here's an updated commit with -v
P:\repotest>hg commit -m commit2 -v
committing files:
file2.txt
committing manifest
committing changelog
Traceback (most recent call last):
  File "mercurial\dispatch.pyo", line 140, in _runcatch
  File "mercurial\dispatch.pyo", line 860, in _dispatch
  File "mercurial\dispatch.pyo", line 621, in runcommand
  File "mercurial\dispatch.pyo", line 951, in _runcommand
  File "mercurial\dispatch.pyo", line 922, in checkargs
  File "mercurial\dispatch.pyo", line 857, in <lambda>
  File "mercurial\util.pyo", line 711, in check
  File "mercurial\commands.pyo", line 1505, in commit
  File "mercurial\cmdutil.pyo", line 2212, in commit
  File "mercurial\commands.pyo", line 1499, in commitfunc
  File "mercurial\localrepo.pyo", line 64, in wrapper
  File "mercurial\localrepo.pyo", line 1397, in commit
  File "mercurial\localrepo.pyo", line 64, in wrapper
  File "mercurial\localrepo.pyo", line 1499, in commitctx
  File "mercurial\transaction.pyo", line 25, in _active
  File "mercurial\transaction.pyo", line 408, in close
  File "mercurial\transaction.pyo", line 465, in _writeundo
  File "mercurial\util.pyo", line 733, in copyfile
  File "shutil.pyo", line 82, in copyfile
IOError: [Errno 2] No such file or directory: 'P:\\repotest\\.hg/store\\journal.backup.fncache'
abort: No such file or directory: P:\repotest\.hg/store\journal.backup.fncache
transaction abort!
failed to recover fncache
rollback completed

P:\repotest>
Comment 4 Deanna Earley 2015-02-20 11:12 UTC
Created attachment 1812 [details]
Screenshot of Explorer failing

I've just tried again, deleting the .hg folder to test again and Explorer complained about the exact same file not existing.
Comment 5 metolone 2015-02-20 12:51 UTC
My work group has been reporting this error as well.  I can reproduce using similar steps reported by Deanna Earley:

Windows 7 64-bit, connected to a network share, using the released tortoisehg-3.3.0-x64.msi:

M:\>hg init x
M:\>cd x
M:\x>echo >file1
M:\x>hg ci -Am 1
adding file1
M:\x>echo >file2
M:\x>hg ci -Am 2
adding file2
abort: No such file or directory: M:\x\.hg/store\journal.backup.fncache
transaction abort!
failed to recover fncache
rollback completed
Comment 6 metolone 2015-02-20 12:53 UTC
We also saw corruption of one repo similar to bloops@gmx.de.  Many missing revlogs.  At the time we thought a network error was the cause but now are not so sure.
Comment 7 metolone 2015-02-20 13:07 UTC
Yesterday I pushed to a network share with 3.3 and saw that rollback message.  I just ran "hg verify" and got a large number of:

data/.hgignore.i@0: missing revlog!
data/.hgtags.i@165: missing revlog!
.etc....
Comment 8 Deanna Earley 2015-02-21 06:20 UTC
Testing with mercurial 3.2 and 3.3 to a Linux samba server worked without issue in the few tests I've done.
This suggests the most likely cause is an NTFS/Windows based file server?

(This was from a different machine, 32-bit Windows 8.1. I'll test 3.2 on the original machine on Monday)
Comment 9 Deanna Earley 2015-02-23 12:38 UTC
Testing with 3.2.1 worked on all 5 attempts:

P:\repotest>hg --version
Mercurial Distributed SCM (version 3.2.1)
(see http://mercurial.selenic.com for more information)

Copyright (C) 2005-2014 Matt Mackall and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

P:\repotest>hg init -v

P:\repotest>hg add file1.txt -v
adding file1.txt

P:\repotest>hg commit -m commit1 -v
file1.txt
calling hook commit.lfiles: hgext.largefiles.reposetup.checkrequireslfiles
committed changeset 0:497348a10e64

P:\repotest>hg add file2.txt -v
adding file2.txt

P:\repotest>hg commit -m commit2 -v
file2.txt
calling hook commit.lfiles: hgext.largefiles.reposetup.checkrequireslfiles
committed changeset 1:cc54f10e1464

P:\repotest>
Comment 10 Deanna Earley 2015-02-23 12:42 UTC
And 5 attempts with 3.2.3 without issue:

P:\repotest>hg init -v

P:\repotest>hg add file1.txt -v
adding file1.txt

P:\repotest>hg commit -m commit1 -v
file1.txt
calling hook commit.lfiles: hgext.largefiles.reposetup.checkrequireslfiles
committed changeset 0:27d689bfb5b9

P:\repotest>hg add file2.txt -v
adding file2.txt

P:\repotest>hg commit -m commit2 -v
file2.txt
calling hook commit.lfiles: hgext.largefiles.reposetup.checkrequireslfiles
committed changeset 1:15570712774c

P:\repotest>hg --version
Mercurial Distributed SCM (version 3.2.3)
(see http://mercurial.selenic.com for more information)

Copyright (C) 2005-2014 Matt Mackall and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

P:\repotest>
Comment 11 Deanna Earley 2015-02-23 12:50 UTC
3.2.4 worked too, and as soon as I tried 3.3.0, it failed again.
Comment 12 Deanna Earley 2015-02-25 06:27 UTC
I've looking at differences in transactions.py between 3.2.4 and 3.3.
I can see that the backup files are included in the undo (#d251da5e0e841971cc54f2fc1fc4d70743056b5c) which includes a hard link from the backup file to the undo file (The point the stack trace shows the crash).
However... in close(), immediately before _writeundo() is called, this file is unlinked on line 400.
Is this a very likely candidate for the file no longer existing? :p

Speculation here, but when used on a local file system, hard links work and may keep the name around if it's associated with a file open under another name.
When writing to a windows network share/file system that doesn't support hard links, that association is gone immediately, resulting in the file not found.
Comment 13 Yuya Nishihara 2015-02-25 07:11 UTC
> in close(), immediately before _writeundo() is called, this file is
> unlinked on line 400

unlink() is called for temporary files, "if not f". _writeundo() ignores them.

> When writing to a windows network share/file system that doesn't support hard
> links, that association is gone immediately, resulting in the file not found.

If I understand it, procmon.CSV shows that the network share causing this issue
does support hardlinks.  I can't reproduce the problem with VBoxSharedFolderFS
on Windows XP, which doesn't seem to support hardlinks.
Comment 14 Deanna Earley 2015-02-25 09:15 UTC
Thanks, I now see where the unlink is called only on temporary files..

hg 3.3 pushing a new file to an existing repository on a remote share also triggers the same thing:

C:\Debug\repo>hg push -v p:\repotest
pushing to p:\repotest
searching for changes
1 changesets found
uncompressed size of bundle content:
     195 (changelog)
     151 (manifests)
     121  file2.txt
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
Traceback (most recent call last):
  File "mercurial\dispatch.pyo", line 140, in _runcatch
  File "mercurial\dispatch.pyo", line 860, in _dispatch
  File "mercurial\dispatch.pyo", line 621, in runcommand
  File "mercurial\dispatch.pyo", line 951, in _runcommand
  File "mercurial\dispatch.pyo", line 922, in checkargs
  File "mercurial\dispatch.pyo", line 857, in <lambda>
  File "mercurial\util.pyo", line 711, in check
  File "mercurial\extensions.pyo", line 149, in wrap
  File "mercurial\util.pyo", line 711, in check
  File "hgext\mq.pyo", line 3483, in mqcommand
  File "mercurial\util.pyo", line 711, in check
  File "mercurial\commands.pyo", line 5115, in push
  File "mercurial\extensions.pyo", line 194, in wrap
  File "hggit\util.pyo", line 42, in inner
  File "hggit\__init__.pyo", line 261, in exchangepush
  File "mercurial\exchange.pyo", line 224, in push
  File "mercurial\exchange.pyo", line 647, in _pushchangeset
  File "mercurial\localrepo.pyo", line 129, in unbundle
  File "mercurial\exchange.pyo", line 1300, in unbundle
  File "mercurial\changegroup.pyo", line 876, in addchangegroup
  File "mercurial\transaction.pyo", line 25, in _active
  File "mercurial\transaction.pyo", line 408, in close
  File "mercurial\transaction.pyo", line 465, in _writeundo
  File "mercurial\util.pyo", line 733, in copyfile
  File "shutil.pyo", line 82, in copyfile
IOError: [Errno 2] No such file or directory: 'p:\\repotest\\.hg/store\\journal.backup.fncache'
abort: No such file or directory: p:\repotest\.hg/store\journal.backup.fncache
transaction abort!
failed to recover fncache
rollback completed

C:\Debug\repo>
Comment 15 Deanna Earley 2015-02-25 09:17 UTC
Oh, and the Windows mklink command doesn't work on network shares and requires a local NTFS file system. I've not tried doing it programmatically though.
Comment 16 Matt Mackall 2015-02-25 10:29 UTC
*** Bug 4545 has been marked as a duplicate of this bug. ***
Comment 17 Adrian Buehlmann 2015-02-26 04:22 UTC
Some Windows network share implementations notoriously cause troubles with hardlinks. Either don't use them or don't create hardlinks on them.

I've been hacking on Mercurial in the past around file access, unlinking and hardlinking on Windows, but I'm not currently contributing code changes. I'm also not using recent versions of Mercurial myself at the moment.

If anyone wants to have a look into fixing issues like this one here, I suggest having a look at http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows and what has been done in vfs.__call__ in scmutil.py to break hardlinks before writing files. For example, if I recall correctly, there were implementations which reported wrong numbers of hardlinks, depending on whether the file was in open state or not.

It's pretty easy to waste a lot of time with this kind of stuff.
Comment 18 Matt Mackall 2015-03-01 19:30 UTC
*** Bug 4533 has been marked as a duplicate of this bug. ***
Comment 19 Matt Mackall 2015-03-02 01:09 UTC
Huh. First off.. we know hardlinks and CIFS do in fact work moderately well, because they've been on by default for clones for nearly a decade.

But there's also been a very long history of quirky corner cases with hardlinks and CIFS and it's pretty clear to us that Microsoft isn't very serious about them. 

I suspect we're seeing a cache coherency issue that looks like this:

client           OS        server
does X exist?
               no idea, let's ask
                            nope
               nope
               (lemme remember that)
hardlink Y to X
               one sec
                            success!
               success!
delete X
               didn't I just tell you X doesn't exist?

I speculate that this works differently on Samba because Samba inserts a cache invalidation message when it does the hardlink that Microsoft has omitted or delayed.

In the interest of safety, I'm just going to disable hardlinking in the transaction code until we can figure out what's going on here.
Comment 20 HG Bot 2015-03-02 02:00 UTC
Fixed by http://selenic.com/repo/hg/rev/e5ce49a30146
Matt Mackall <mpm@selenic.com>
transaction: disable hardlink backups (issue4546)

Causing troubles, simplest fix.

(please test the fix)
Comment 21 Gavin Erry 2015-03-02 04:04 UTC
(In reply to comment #20)

Fix does not work for me (bug 4533).  Get exactly the same problem described in 4533.

Gavin
Comment 22 Gavin Erry 2015-03-02 04:08 UTC
(In reply to comment #21)

Sorry forgot to say this was with tagged version 3.3.1, changeset 5b4ed033390b

Gavin
Comment 23 Matt Harbison 2015-03-02 10:42 UTC
(In reply to comment #22)

The fix was to change 'if hardlink:' -> 'if False or hardlink:'.  'Or' should have been 'and'.

I sent a patch to fix (from work), but don't see it on the ML yet, so maybe it's in a spam filter.  If it doesn't appear later tonight, I'll resend.
Comment 24 HG Bot 2015-03-02 11:45 UTC
Fixed by http://selenic.com/repo/hg/rev/07a92bbd02e5
Matt Harbison <matt_harbison@yahoo.com>
transaction: really disable hardlink backups (issue4546)

(please test the fix)
Comment 25 Deanna Earley 2015-03-03 03:58 UTC
The workaround now allows commits and pushes to work as expected.

P:\repotest>hg --version
Mercurial Distributed SCM (version 3.3.2)
(see http://mercurial.selenic.com for more information)

Copyright (C) 2005-2014 Matt Mackall and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

P:\repotest>hg init -v

P:\repotest>hg add file1.txt -v
adding file1.txt

P:\repotest>hg commit -m commit1 -v
committing files:
file1.txt
committing manifest
committing changelog
calling hook commit.lfiles: hgext.largefiles.reposetup.checkrequireslfiles
committed changeset 0:204c100d5c5a

P:\repotest>hg add file2.txt -v
adding file2.txt

P:\repotest>hg commit -m commit2 -v
committing files:
file2.txt
committing manifest
committing changelog
calling hook commit.lfiles: hgext.largefiles.reposetup.checkrequireslfiles
committed changeset 1:0a8f3a469a28

P:\repotest>

I'm still more than happy to test full fixes when you come to work on this.
Thanks

(I'm not resolving this in case you want to leave it open for the full fix)
Comment 26 Bugzilla 2015-03-11 01:00 UTC
Bug was set to TESTING for 7 days, resolving
Comment 27 Deanna Earley 2015-03-31 12:12 UTC
Reopened so the case isn't forgotten about following a discussion in #mercurial on IRC.
The workaround in 07a92bbd02e5 allows it to work for now.
Comment 28 Matt Mackall 2015-03-31 13:42 UTC
Opened a new bug #4580 for the residual issue here, re-closing.