libatomic is broken on RISC-V
This is a follow up for #12817 (closed) to gather all information/discussion on this libatomic bug in one place.
Below is my current understanding of the issue.
Problem Description
On riscv64, some libatomic functions simply call themselves recursively
until a stack overflow occurs. As an example take a look at the
disassembly for __atomic_compare_exchange_1
:
0000000000001e40 <__atomic_compare_exchange_1@plt>:
1e40: 00003e17 auipc t3,0x3
1e44: 2a0e3e03 ld t3,672(t3) # 50e0 <__atomic_compare_exchange_1+0x2312>
1e48: 000e0367 jalr t1,t3
1e4c: 00000013 nop
0000000000002dce <__atomic_compare_exchange_1>:
2dce: 1141 addi sp,sp,-16
2dd0: 4701 li a4,0
2dd2: 4695 li a3,5
2dd4: e406 sd ra,8(sp)
2dd6: 86aff0ef jal ra,1e40 <__atomic_compare_exchange_1@plt>
2dda: 60a2 ld ra,8(sp)
2ddc: 0141 addi sp,sp,16
2dde: 8082 ret
This incorrect assembly seems to be caused by the
0040-configure-Add-enable-autolink-libatomic-use-in-LINK_.patch
patch
which was added in d9ac288e (CC: @ddevault) and enabled by default on riscv64 in
9a634161 (CC: @clandmeter). This patch
passes -latomic
to every linker invocation on riscv64. This includes
the test code compiled and linked as part of the libatomic ./configure
script which attempts to determine whether atomic builtins are available
on the current architecture [1]. My current
understanding is that, due to the fact that this test code is also
linked against -latomic
it uses the libatomic functions instead of the
compiler builtins and thus incorrectly determines which builtins are
available on RISC-V. Instead of calling the compiler builtin
__atomic_compare_exchange
(which is not available on RISC-V) the code
above therefore calls itself recursively.
For example, this is what the output of the libatomic ./configure
script regarding __atomic_compare_exchange
builtins should look like on RISC-V:
checking for __atomic_compare_exchange for size 1... no
checking for __atomic_compare_exchange for size 2... no
checking for __atomic_compare_exchange for size 4... yes
checking for __atomic_compare_exchange for size 8... yes
checking for __atomic_compare_exchange for size 16... no
This is what it actually looks like at the moment:
checking for __atomic_compare_exchange for size 1... yes
checking for __atomic_compare_exchange for size 2... yes
checking for __atomic_compare_exchange for size 4... yes
checking for __atomic_compare_exchange for size 8... yes
checking for __atomic_compare_exchange for size 16... yes
That is, the ./configure
script believes atomic compiler builtins to
be available even if there are not. Presently, software linked against
libatomic which uses atomic functions for which the builtins have not
been detected correctly by the ./configure
script will crash due to a
stack overflow.
RISC-V and -latomic
The 0040-configure-Add-enable-autolink-libatomic-use-in-LINK_.patch
seems to have been added in the first place since code which uses
builtin atomics or C11 atomics needs to be explicitly linked with
-latomic
on RISC-V but not on many other popular architectures
(such as x86_64) [2]. Thus supposedly causing a
lot of RISC-V specific build failures.
Though it seems that this is not entirely RISC-V specific but also a
problem on other platforms [3]. I briefly spoke to
the Debian RISC-V folks on #debian-riscv
and according to them “the
same problem affects the Debian armel/mipsel/m68k/powerpc/sh4 ports”.
Solutions
I am not familiar with GCC compiler internals and thus unsure what the
best way to fix this issue would be. Debian manually adds -latomic
to
packages which need it instead of patching GCC. I have a slight
preference towards doing the same as the compiler patch we employ
presently seems to have unintended side effects. If we don't want to
manually modify LDFLAGS
for affected packages we can also add
[ "$CARCH" = "riscv64" ] && LDFLAGS="$LDFLAGS -latomic"
to
/etc/abuild.conf
.
Alternatively, the patch
0040-configure-Add-enable-autolink-libatomic-use-in-LINK_.patch
would
need to be adjusted somehow. Either allowing disabling the libatomic
autolinking somehow and/or patching the libatomic configure script. My understanding of the GCC code base is insufficient so I don't feel personally comfortable modifying the patch.
This is what I have gathered so far about this issue, if my understanding of the issue is correct: Which solution would be preferable?