libatomic is broken on RISC-V
This is a follow up for #12817 (closed) to gather all information/discussion on this libatomic bug in one place.
Below is my current understanding of the issue.
On riscv64, some libatomic functions simply call themselves recursively
until a stack overflow occurs. As an example take a look at the
0000000000001e40 <__atomic_compare_exchange_1@plt>: 1e40: 00003e17 auipc t3,0x3 1e44: 2a0e3e03 ld t3,672(t3) # 50e0 <__atomic_compare_exchange_1+0x2312> 1e48: 000e0367 jalr t1,t3 1e4c: 00000013 nop 0000000000002dce <__atomic_compare_exchange_1>: 2dce: 1141 addi sp,sp,-16 2dd0: 4701 li a4,0 2dd2: 4695 li a3,5 2dd4: e406 sd ra,8(sp) 2dd6: 86aff0ef jal ra,1e40 <__atomic_compare_exchange_1@plt> 2dda: 60a2 ld ra,8(sp) 2ddc: 0141 addi sp,sp,16 2dde: 8082 ret
This incorrect assembly seems to be caused by the
which was added in d9ac288e (CC: @ddevault) and enabled by default on riscv64 in
9a634161 (CC: @clandmeter). This patch
-latomic to every linker invocation on riscv64. This includes
the test code compiled and linked as part of the libatomic
script which attempts to determine whether atomic builtins are available
on the current architecture . My current
understanding is that, due to the fact that this test code is also
-latomic it uses the libatomic functions instead of the
compiler builtins and thus incorrectly determines which builtins are
available on RISC-V. Instead of calling the compiler builtin
__atomic_compare_exchange (which is not available on RISC-V) the code
above therefore calls itself recursively.
For example, this is what the output of the libatomic
__atomic_compare_exchange builtins should look like on RISC-V:
checking for __atomic_compare_exchange for size 1... no checking for __atomic_compare_exchange for size 2... no checking for __atomic_compare_exchange for size 4... yes checking for __atomic_compare_exchange for size 8... yes checking for __atomic_compare_exchange for size 16... no
This is what it actually looks like at the moment:
checking for __atomic_compare_exchange for size 1... yes checking for __atomic_compare_exchange for size 2... yes checking for __atomic_compare_exchange for size 4... yes checking for __atomic_compare_exchange for size 8... yes checking for __atomic_compare_exchange for size 16... yes
That is, the
./configure script believes atomic compiler builtins to
be available even if there are not. Presently, software linked against
libatomic which uses atomic functions for which the builtins have not
been detected correctly by the
./configure script will crash due to a
RISC-V and -latomic
seems to have been added in the first place since code which uses
builtin atomics or C11 atomics needs to be explicitly linked with
-latomic on RISC-V but not on many other popular architectures
(such as x86_64) . Thus supposedly causing a
lot of RISC-V specific build failures.
Though it seems that this is not entirely RISC-V specific but also a
problem on other platforms . I briefly spoke to
the Debian RISC-V folks on
#debian-riscv and according to them “the
same problem affects the Debian armel/mipsel/m68k/powerpc/sh4 ports”.
I am not familiar with GCC compiler internals and thus unsure what the
best way to fix this issue would be. Debian manually adds
packages which need it instead of patching GCC. I have a slight
preference towards doing the same as the compiler patch we employ
presently seems to have unintended side effects. If we don't want to
LDFLAGS for affected packages we can also add
[ "$CARCH" = "riscv64" ] && LDFLAGS="$LDFLAGS -latomic" to
Alternatively, the patch
need to be adjusted somehow. Either allowing disabling the libatomic
autolinking somehow and/or patching the libatomic configure script. My understanding of the GCC code base is insufficient so I don't feel personally comfortable modifying the patch.
This is what I have gathered so far about this issue, if my understanding of the issue is correct: Which solution would be preferable?