abuild-fetch: try to work around an ESTALE error which occurs on NFS
Hello,
abuild-fetch
can fail with an ESTALE
error when a destination directory is
on an NFS file system and more than one processes from different hosts try to
lock the same file at the same time:
### HOST 1 ###
$ for i in `seq 10`; do echo === $i; ./abuild-fetch -d /nfs https://curl.se/download/curl-8.7.1.tar.xz; done
=== 1
=== 2
=== 3
=== 4
=== 5
=== 6
=== 7
abuild-fetch: failed to acquire lock: /nfs/curl-8.7.1.tar.xz.lock: Stale file handle
=== 8
=== 9
=== 10
### HOST 2 ###
$ for i in `seq 10`; do echo === $i; ./abuild-fetch -d /nfs https://curl.se/download/curl-8.7.1.tar.xz; done
=== 1
abuild-fetch: failed to acquire lock: /nfs/curl-8.7.1.tar.xz.lock: Stale file handle
=== 2
=== 3
abuild-fetch: failed to acquire lock: /nfs/curl-8.7.1.tar.xz.lock: Stale file handle
=== 4
=== 5
=== 6
=== 7
=== 8
abuild-fetch: failed to acquire lock: /nfs/curl-8.7.1.tar.xz.lock: Stale file handle
=== 9
=== 10
This is because of the following race condition case:
A B
|
lockfd = open(lockfile, ...) |
| unlink(lockfile)
lockf(lockfd, F_LOCK, 0) |
According to https://nfs.sourceforge.net/#faq_a10, to recover from an ESTALE
error, an application must close the file or directory where the error
occurred, and reopen it so the NFS client can resolve the pathname again and
retrieve the new file handle.
This merge request introduces the code that does several attempts to recover
from an ESTALE
error. It does not fully fix the issue but it makes the chance
to hit it much lower.
FWIW, I hit this issue only using an NFS server powered by FreeBSD. I couldn't reproduce it using an NFS server powered by Linux.