[Bug 5935] New: Cannot push immediately after commit, but one second later it works

mercurial-bugs at mercurial-scm.org mercurial-bugs at mercurial-scm.org
Wed Jul 4 04:39:29 UTC 2018


https://bz.mercurial-scm.org/show_bug.cgi?id=5935

            Bug ID: 5935
           Summary: Cannot push immediately after commit, but one second
                    later it works
           Product: Mercurial
           Version: 4.0
          Hardware: PC
                OS: Linux
            Status: UNCONFIRMED
          Severity: bug
          Priority: wish
         Component: Mercurial
          Assignee: bugzilla at mercurial-scm.org
          Reporter: nicolas.barbier at gmail.com
                CC: mercurial-devel at mercurial-scm.org

Hey Mercurial developers,

We think we have stumbled on a very confusing race condition, probably related
to the handling of hardlinks that are used when cloning locally.

Situation:

* We have some "shared by using setgid" directories: We have two users that
belong to the same group, both the original repo and the clone are in
directories having that common group as their group, the setgid bit is set, and
the users have a umask of 002. This way, they can both read/write/delete/etc
files and directories in this whole directory hierarchy.
* We have a script that commits something (a change in a single file) and then
immediately pushes that commit.
* The push is over the filesystem, from some other directory (not in /var) to
/var/local/hg/XXX/YYY on the same filesystem (so hardlinking is possible).

Problem:

* This push fails the first time (right after we cloned the repo from
/var/local/hg/XXX/YYY) as follows. We use set -e, so the commands have a "+" in
front of them:

+ umask
0002
+ id uid=1015(siemen) gid=1016(siemen) groups=1016(siemen),1013(XXX)
+ ls -l /var/local/hg/XXX/YYY/.hg/store/00changelog.i
-rw-rw-r-- 2 itsme XXX 18760 Jun  8 15:48
/var/local/hg/XXX/YYY/.hg/store/00changelog.i
+ hg commit -u 'Release <release at ZZZ.com>' -m 'Changed artifact version to
QQQ.'
+ ls -l /var/local/hg/XXX/YYY/.hg/store/00changelog.i
-rw-rw-r-- 1 itsme XXX 18760 Jun  8 15:48
/var/local/hg/XXX/YYY/.hg/store/00changelog.i
+ hg push /var/local/hg/XXX/YYY
pushing to /var/local/hg/XXX/YYY
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
transaction abort!
rollback failed - please run hg recover
abort: Operation not permitted: '/var/local/hg/XXX/YYY/.hg/store/00changelog.i'

Further analysis:

* It seems that Mercurial is trying to do something with this file
/var/local/hg/XXX/YYY/.hg/store/00changelog.i for which it doesn't have the
permission (e.g., change the ownership or permissions), because it is running
as "siemen" while the file is owned by "itsme". It is not clear to me what it
is trying to do, so improving the error message to include some context would
be great.
* We had quite a few of such repositories that all had the exact same problem,
so we could try multiple times.
* When we put a "read A" in between the commit and the push, and waited for a
about a second before pressing enter, the push actually reproducibly succeeded,
while before we added that wait, it reproducibly failed. Waiting only a
fraction of a second was not enough, it had to be about a second or so.
* Before, we did the pushing over SSH (to another server). We only started to
have the problem since we now use filesystem-level clones.
* After we either recovered the target repo and repushed manually, or after the
push succeeded because of the wait, we didn't have the problem anymore. The
difference seems to be that 00changelog.i is not hardlinked anymore, but
replaced with a copy.

My guess:

It seems to be that there is some kind of race condition in the test that tries
to determine whether a file is still hardlinked or not. That race condition
seems to be between two different processes that run in sequence (the push only
starts after the commit already ended). It seems that 00changelog.i in the
target repo still looks hardlinked to the pushing process, even though the
previous commit should have made a copy. Might it be that the kernel keeps the
file descriptor until a bit after the process that had it opened, already
ended? And that an open file descriptor increases the hardlink count, because
it should prevent the file from being removed physically?

I assume that this problem might not have been detected before because it
requires the combination of the "directory sharing through setgid", locally
cloned repositories, and a first push that happens right (less than a second)
after a commit.

Sorry for not having more time to investigate this better. I hope that this
rings a bell for someone that knows the Mercurial code.

Versions:

Mercurial 4.0 (as packaged by Debian)
Debian 9.4
Linux 4.9.0-6-amd64
Filesystem: ext4

Greetings,

Nicolas

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list