Segfault on mdadm (because of Musl 1.2.2?)
Hi,
I have updated Alpine Linux from 3.12.0 to 3.13.0.
Since then, I see some segfaults in my logs:
mdadm[10989]: segfault at 7f9e094311db ip 00007f9e09483792 sp 00007fff90024c58 error 4 in ld-musl-x86_64.so.1[7f9e09448000+48000]
Code: 84 c0 74 0d 48 8d 3c 06 48 8d 1c 01 48 39 c2 75 e3 c6 03 00 e8 0a 00 00 00 48 29 eb 5a 48 01 d8 5b 5d c3 48 89 f8 a8 07 74 0a <80> 38 00 74 3b 48 ff c0 eb f2 49 b8 ff fe fe fe fe fe fe fe 48 be
and
mdadm[11387]: segfault at 7fac20d6d283 ip 00007fac20dbf792 sp 00007ffe4047f2f8 error 4 in ld-musl-x86_64.so.1[7fac20d84000+48000]
Code: 84 c0 74 0d 48 8d 3c 06 48 8d 1c 01 48 39 c2 75 e3 c6 03 00 e8 0a 00 00 00 48 29 eb 5a 48 01 d8 5b 5d c3 48 89 f8 a8 07 74 0a <80> 38 00 74 3b 48 ff c0 eb f2 49 b8 ff fe fe fe fe fe fe fe 48 be
udevd[11249]: 'mdadm --incremental --export /dev/sde1 --offroot /dev/disk/by-id/scsi-36d0946604c1723002791c133fe851a9d-part1 /dev/disk/by-id/wwn-0x6d0946604c1723002791c133fe851a9d-part1 /dev/disk/by-partlabel/\x2fboot /dev/disk/by-partuuid/ce0c13cb-39f1-4c47-9fca-b895ae4c71ca /dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:4:0-part1' [11387] terminated by signal 11 (Segmentation fault)
They are triggered by udev
that executes mdadm --incremental /dev/sda1
.
And indeed, if I run the command manually, I get a segfault:
# mdadm --incremental /dev/sda1
Segmentation fault
With gdb I got more informations:
(gdb) run --incremental /dev/sda1
Starting program: mdadm --incremental /dev/sda1
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fb7792 in strlen () from /lib/ld-musl-x86_64.so.1
(gdb) backtrace
#0 0x00007ffff7fb7792 in strlen () from /lib/ld-musl-x86_64.so.1
#1 0x00007ffff7fb75ea in strdup () from /lib/ld-musl-x86_64.so.1
#2 0x0000000000002000 in ?? ()
#3 0x00007ffff7f5f133 in ?? ()
#4 0x00007fffffffa8d8 in ?? ()
#5 0x00005555555a7514 in ?? ()
#6 0x00005555555605d0 in ?? ()
#7 0x0000555555561473 in ?? ()
#8 0x000055555557f92e in ?? ()
#9 0x000055555555dcc8 in ?? ()
#10 0x00007ffff7f83a03 in ?? () from /lib/ld-musl-x86_64.so.1
#11 0x00007ffff7f839dc in ?? () from /lib/ld-musl-x86_64.so.1
#12 0x00007fffffffec00 in ?? ()
#13 0x0000000000000000 in ?? ()
And with ltrace:
opendir("/dev/disk/by-path/") = 0x7f4227613030
readdir(0x7f4227613030) = 0x7f4227613048
readdir(0x7f4227613030) = 0x7f4227613060
readdir(0x7f4227613030) = 0x7f4227613078
strncpy(0x7ffe0ca3ce4a, "pci-0000:01:00.0-scsi-0:2:2:0-pa"..., 4078) = 0x7ffe0ca3ce4a
stat(0x7ffe0ca3ce38, 0x7ffe0ca3cda8, 3, 0) = 0
readdir(0x7f4227613030) = 0x7f42276130b0
strncpy(0x7ffe0ca3ce4a, "pci-0000:01:00.0-scsi-0:2:2:0-pa"..., 4078) = 0x7ffe0ca3ce4a
stat(0x7ffe0ca3ce38, 0x7ffe0ca3cda8, 3, 0) = 0
readdir(0x7f4227613030) = 0x7f42276130e8
strncpy(0x7ffe0ca3ce4a, "pci-0000:01:00.0-scsi-0:2:0:0-pa"..., 4078) = 0x7ffe0ca3ce4a
stat(0x7ffe0ca3ce38, 0x7ffe0ca3cda8, 3, 0) = 0
closedir(0x7f4227613030) = 0
strdup("\373\374\375\376\377" <no return ...>
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++
I copied ld-musl-x86_64.so.1
from Alpine 3.12.0 and I don't have the segfault anymore (just a "Resource busy" which seems expected).
It seems that Musl have been updated from 1.1.24 to 1.2.2 so I suppose the problem is here (mdadm binary is the same between 3.12.0 and 3.13.0).
Note that I don't have a segfault if "/dev/disk/by-path/" is not populated.
Right now I don't know how to handle this more. Don't hesitate to ask me for more informations if I can help :)