Last weekend I updated two KVM hosts (x86_64) running Alpine 3.11.6 to 3.12.0.
Since upgrading libvirt service (libvirtd) is not starting correctly.
Two processes of libvirtd are spawned and interaction with libvirt (ie: virsh or remotely over TLS) doesn't work. If you run a virsh command, the command will just hang for a long time until I press ctrl+c to stop it.
If I manually kill one of the libvirtd processes, libvirt seems to start working again. I can execute virsh commands without any issues, same with remote access.
I found this other issue that may be related but I'm not really sure: 11361
The problem is not only happening at startup, I'm still investigating but even while libvirt is already running, after sometime a second process is spawned (don't know why yet) and libvirt stops working again. I need to connect to the KVM host, kill the new libvirtd process that was spawned, and operation resumes as usual.
Any ideas?
Thanks.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
This is the challenge at the moment, replicating the issue.
Tomorrow I will raise libvirt logging level to info or debug to try to catch the problem or at least to get some hints of what can be causing it and report back my findings (if any).
I don’t have any spare server to test it unless I mix libvirt from edge with the rest of packages from 3.12.
Correct me if I’m wrong but since I’m running from RAM (and lbu) I should be able to enable edge, upgrade libvirt, test, comment edge repos and just reboot to be back in the original situation. Right?
lIBVIRTD Status is crashed in alpine 3.14. I manually start stop many times and install the latest libvirtd version 9.4.0. Start stop wont start the service. Can someone help me out.
Starting virtlogd ...
Error relocating /usr/sbin/virtlogd: virIdentityGetProcessID: symbol not found
Error relocating /usr/sbin/virtlogd: virNetServerClientSetQuietEOF: symbol not found
Error relocating /usr/sbin/virtlogd: virIdentityGetGroupName: symbol not found
Error relocating /usr/sbin/virtlogd: virNetServerAddServiceUNIX: symbol not found
Error relocating /usr/sbin/virtlogd: virIdentityGetUserName: symbol not found
Error relocating /usr/sbin/virtlogd: virNetServerUpdateTlsFiles: symbol not found
Error relocating /usr/sbin/virtlogd: virSystemdGetActivation: symbol not found
Error relocating /usr/sbin/virtlogd: virTypedParamListAddInt: symbol not found
Error relocating /usr/sbin/virtlogd: virTypedParamListAddBoolean: symbol not found
Error relocating /usr/sbin/virtlogd: virDaemonForkIntoBackground: symbol not found
Error relocating /usr/sbin/virtlogd: virSystemdActivationComplete: symbol not found
Error relocating /usr/sbin/virtlogd: virFileActivateDirOverrideForProg: symbol not found
Error relocating /usr/sbin/virtlogd: virTypedParamListFree: symbol not found
Error relocating /usr/sbin/virtlogd: virPipe: symbol not found
Error relocating /usr/sbin/virtlogd: virTypedParamListAddString: symbol not found
Error relocating /usr/sbin/virtlogd: virTypedParamListAddUInt: symbol not found
Error relocating /usr/sbin/virtlogd: virSystemdActivationFree: symbol not found
Error relocating /usr/sbin/virtlogd: virDaemonUnixSocketPaths: symbol not found
Error relocating /usr/sbin/virtlogd: virTypedParamListStealParams: symbol not found
Error relocating /usr/sbin/virtlogd: virDaemonSetupLogging: symbol not found
start-stop-daemon: failed to start `/usr/sbin/virtlogd'
Failed to start virtlogd [ !! ]
ERROR: virtlogd failed to start
ERROR: cannot start libvirtd as virtlogd would not start
Any chance libvirt 6.4.0 can be backported to 3.12?
Libvirt 6.3 seems to be really broken and for me is causing a lot of issues (VM management, Vm backups, etc).
If 6.4 will not be backported to 3.12 then I will have to rollback to Alpine 3.11.6 while I wait for Alpine 3.12.1 since I can't risk to mix main and edge and cause other problems on the KVM hosts.
I experienced this issue today after upgrading my KVM box from 3.11 to 3.12. I checked its libvirt version, and the new 6.5.0 was installed during the upgrade.
The only non-standard factors I can think of is that it's an AMD Ryzen 1700 with IOMMU and NPT enabled (kernel cmdline: amd_iommu=on iommu=pt kvm_amd.npt=1) because I use it for PCI passthrough with a SAS controller.
However, this issue occurred even before starting the VM with the PCI device.
Unfortunately it's a fairly critical machine, so I had to boot it into its old Debian installation and didn't even get to try 3.12 without the IOMMU/NPT kernel options.
Will look into it and debug further as soon as I get the opportunity.
Does anybody else affected by this issue use AMD hardware and/or IOMMU?
I'm getting it on fresh installs of Alpine 3.12 on a Ryzen 2700X and 3700X, no IOMMU or any special params. I'm also getting on Intel Sandy Bridge (i5-2400).
I don’t think this issue is hardware related. I’m having this problem with Intel Xeon Ivy Bridge.
I tried to run libvirtd manually (directly the binary, not through openrc) but I got the same result. It’s like libvirtd is spawning another process to do something but the new process just get stuck (waiting?).
I’m wondering if either running libvirtd in debug mode or through strace would help to understand what’s going on.
Not sure strace would help as it is the second process that seems to be stuck not the main one.
I searched the web for any similar issue reported upstream to no avail.
I have never used strace before, so I don't really know how to interpret this, but attached is the output of strace -o libvirt.log -s 1000 libvirtd -v: libvirt.log
Starting libvirtd like that will hang virsh 50% of the time.
Strangely enough, running strace -o libvirt.log -f -s 1000 libvirtd -v will NEVER hang.
This is almost surely a known upstream bug in libvirt due to use of non-AS-safe functions after fork from a multithreaded parent. To fix it, in src/util/vircommand.c, remove all of the log fiddling (virLogReset, virLogSetFromEnv, and virReportSystemError) from virExec. This may leave fd leaks to child; if so, the log-opening code needs to be fixed to properly set close-on-exec for fd's it uses.
Thanks for bringing this upstream. Went through the issue there and it doesn't look promising with libvirt devs just reflecting the issue on musl. Will keep looking for updates there.
The upstream issue is closed, but I still can't properly shutdown libvirtd, and when started virt-manager can often (most of the time) not connect to it. This is on edge with libvirt 6.5.0.
Yes, upstream is not interested in fixing this bug so someone just needs to make the fix I described as a patch for Alpine and any other interested distros to carry. It should be very simple, just removing all the logging-related code in the affected codepath.
Updated the init from gentoo, merged in commit 4aba5959.
I've tested the init with in my workstation and libvirt starts fine.
So, with init updated and the patch applied, probably the issue is almost fixed.
Can you provide a feedback?
the patches does not solve the problem completely and it may still deadlock if there are any errors to report. I still think it may make sense to backport the patches I upstreamed, as they reduce some of the deadlocks and at least my system become usable again.
@ncopa "usable again" sounds much much better that what we have at the moment (backups getting stuck, management interface suddenly not working anymore, having to connect to the KVM hosts to manually kill the child processes, etc).
Thanks a lot for looking into this and engaging upstream to help fixing it as well.
With the latest packages the service startup lock doesn't seem to occur anymore, but unfortunately it still happens randomly while the service is running and processing requests.
Any other ideas on how to mitigate this. I don't want to go back to Alpine 3.11.6 but as a last resort.
Quick question.
After installing libvirt-dbg, should I restart the libvirtd service? as usual o should I stop it and manually start the libvirtd process with a command similar to the below before doing the repro:
Please find below gbd output. I'm not sure if the debug symbols were loaded correctly so let me know if I need to try again:
GNU gdb (GDB) 9.2Copyright (C) 2020 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details.This GDB was configured as "x86_64-alpine-linux-musl".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>.For help, type "help".Type "apropos word" to search for commands related to "word".Attaching to process 18541Reading symbols from /usr/sbin/libvirtd...(No debugging symbols found in /usr/sbin/libvirtd)Reading symbols from /usr/lib/libvirt-lxc.so.0...(No debugging symbols found in /usr/lib/libvirt-lxc.so.0)Reading symbols from /usr/lib/libvirt-qemu.so.0...(No debugging symbols found in /usr/lib/libvirt-qemu.so.0)Reading symbols from /usr/lib/libvirt.so.0...(No debugging symbols found in /usr/lib/libvirt.so.0)Reading symbols from /usr/lib/libtirpc.so.3...(No debugging symbols found in /usr/lib/libtirpc.so.3)Reading symbols from /usr/lib/libdbus-1.so.3...(No debugging symbols found in /usr/lib/libdbus-1.so.3)Reading symbols from /usr/lib/libgobject-2.0.so.0...(No debugging symbols found in /usr/lib/libgobject-2.0.so.0)Reading symbols from /usr/lib/libglib-2.0.so.0...(No debugging symbols found in /usr/lib/libglib-2.0.so.0)Reading symbols from /usr/lib/libintl.so.8...(No debugging symbols found in /usr/lib/libintl.so.8)Reading symbols from /usr/lib/libgcc_s.so.1...(No debugging symbols found in /usr/lib/libgcc_s.so.1)Reading symbols from /lib/ld-musl-x86_64.so.1...Reading symbols from /usr/lib/debug//lib/ld-musl-x86_64.so.1.debug...--Type <RET> for more, q to quit, c to continue without paging--Reading symbols from /usr/lib/libcap-ng.so.0...(No debugging symbols found in /usr/lib/libcap-ng.so.0)Reading symbols from /usr/lib/libyajl.so.2...(No debugging symbols found in /usr/lib/libyajl.so.2)Reading symbols from /usr/lib/libnl-3.so.200...(No debugging symbols found in /usr/lib/libnl-3.so.200)Reading symbols from /usr/lib/libxml2.so.2...(No debugging symbols found in /usr/lib/libxml2.so.2)Reading symbols from /usr/lib/libgio-2.0.so.0...(No debugging symbols found in /usr/lib/libgio-2.0.so.0)Reading symbols from /usr/lib/libsasl2.so.3...(No debugging symbols found in /usr/lib/libsasl2.so.3)Reading symbols from /usr/lib/libgnutls.so.30...(No debugging symbols found in /usr/lib/libgnutls.so.30)Reading symbols from /usr/lib/libcurl.so.4...(No debugging symbols found in /usr/lib/libcurl.so.4)Reading symbols from /usr/lib/libgssapi_krb5.so.2...(No debugging symbols found in /usr/lib/libgssapi_krb5.so.2)Reading symbols from /usr/lib/libffi.so.7...(No debugging symbols found in /usr/lib/libffi.so.7)Reading symbols from /usr/lib/libpcre.so.1...(No debugging symbols found in /usr/lib/libpcre.so.1)Reading symbols from /lib/libz.so.1...(No debugging symbols found in /lib/libz.so.1)Reading symbols from /usr/lib/liblzma.so.5...(No debugging symbols found in /usr/lib/liblzma.so.5)Reading symbols from /usr/lib/libgmodule-2.0.so.0...(No debugging symbols found in /usr/lib/libgmodule-2.0.so.0)Reading symbols from /lib/libmount.so.1...(No debugging symbols found in /lib/libmount.so.1)Reading symbols from /usr/lib/libp11-kit.so.0...(No debugging symbols found in /usr/lib/libp11-kit.so.0)Reading symbols from /usr/lib/libunistring.so.2...(No debugging symbols found in /usr/lib/libunistring.so.2)Reading symbols from /usr/lib/libtasn1.so.6...(No debugging symbols found in /usr/lib/libtasn1.so.6)Reading symbols from /usr/lib/libnettle.so.7...(No debugging symbols found in /usr/lib/libnettle.so.7)--Type <RET> for more, q to quit, c to continue without paging--Reading symbols from /usr/lib/libhogweed.so.5...(No debugging symbols found in /usr/lib/libhogweed.so.5)Reading symbols from /usr/lib/libgmp.so.10...(No debugging symbols found in /usr/lib/libgmp.so.10)Reading symbols from /usr/lib/libnghttp2.so.14...(No debugging symbols found in /usr/lib/libnghttp2.so.14)Reading symbols from /lib/libssl.so.1.1...(No debugging symbols found in /lib/libssl.so.1.1)Reading symbols from /lib/libcrypto.so.1.1...(No debugging symbols found in /lib/libcrypto.so.1.1)Reading symbols from /usr/lib/libkrb5.so.3...(No debugging symbols found in /usr/lib/libkrb5.so.3)Reading symbols from /usr/lib/libk5crypto.so.3...(No debugging symbols found in /usr/lib/libk5crypto.so.3)Reading symbols from /lib/libcom_err.so.2...(No debugging symbols found in /lib/libcom_err.so.2)Reading symbols from /usr/lib/libkrb5support.so.0...(No debugging symbols found in /usr/lib/libkrb5support.so.0)Reading symbols from /lib/libblkid.so.1...(No debugging symbols found in /lib/libblkid.so.1)Reading symbols from /usr/lib/libkeyutils.so.1...(No debugging symbols found in /usr/lib/libkeyutils.so.1)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_network.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_network.so)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_interface.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_interface.so)Reading symbols from /usr/lib/libnetcf.so.1...(No debugging symbols found in /usr/lib/libnetcf.so.1)Reading symbols from /lib/libudev.so.1...(No debugging symbols found in /lib/libudev.so.1)Reading symbols from /usr/lib/libaugeas.so.0...(No debugging symbols found in /usr/lib/libaugeas.so.0)Reading symbols from /usr/lib/libexslt.so.0...(No debugging symbols found in /usr/lib/libexslt.so.0)Reading symbols from /usr/lib/libxslt.so.1...(No debugging symbols found in /usr/lib/libxslt.so.1)Reading symbols from /usr/lib/libnl-route-3.so.200...(No debugging symbols found in /usr/lib/libnl-route-3.so.200)--Type <RET> for more, q to quit, c to continue without paging--Reading symbols from /usr/lib/libfa.so.1...(No debugging symbols found in /usr/lib/libfa.so.1)Reading symbols from /usr/lib/libgcrypt.so.20...(No debugging symbols found in /usr/lib/libgcrypt.so.20)Reading symbols from /usr/lib/libgpg-error.so.0...(No debugging symbols found in /usr/lib/libgpg-error.so.0)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_secret.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_secret.so)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_fs.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_fs.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_logical.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_logical.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_scsi.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_scsi.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_mpath.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_mpath.so)Reading symbols from /lib/libdevmapper.so.1.02...(No debugging symbols found in /lib/libdevmapper.so.1.02)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_disk.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_disk.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_zfs.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_zfs.so)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_nodedev.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_nodedev.so)Reading symbols from /usr/lib/libpciaccess.so.0...(No debugging symbols found in /usr/lib/libpciaccess.so.0)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_nwfilter.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_nwfilter.so)Reading symbols from /usr/lib/libpcap.so.1...(No debugging symbols found in /usr/lib/libpcap.so.1)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so)__wait (addr=addr@entry=0x7fe200bfba88 <mal+8>, waiters=waiters@entry=0x7fe200bfba8c <mal+12>, val=val@entry=1, priv=128, priv@entry=1) at ./arch/x86_64/syscall_arch.h:4040 ./arch/x86_64/syscall_arch.h: No such file or directory.(gdb) backtrace#0 __wait (addr=addr@entry=0x7fe200bfba88 <mal+8>, waiters=waiters@entry=0x7fe200bfba8c <mal+12>, val=val@entry=1, priv=128, priv@entry=1) at ./arch/x86_64/syscall_arch.h:40#1 0x00007fe200b8c3ca in lock (lk=0x7fe200bfba88 <mal+8>) at src/malloc/malloc.c:31#2 lock_bin (i=0) at src/malloc/malloc.c:46#3 malloc (n=<optimized out>, n@entry=8) at src/malloc/malloc.c:320#4 0x00007fe200b8c584 in calloc (m=<optimized out>, n=8) at src/malloc/malloc.c:361#5 0x00007fe20062212e in g_malloc0 () from /usr/lib/libglib-2.0.so.0#6 0x00007fe20082c732 in virAllocN () from /usr/lib/libvirt.so.0#7 0x00007fe20092fa50 in ?? () from /usr/lib/libvirt.so.0#8 0x00007fe200882704 in virProcessRunInFork () from /usr/lib/libvirt.so.0#9 0x00007fe20092fd22 in ?? () from /usr/lib/libvirt.so.0#10 0x00007fe200930dbd in virSecurityManagerTransactionCommit () from /usr/lib/libvirt.so.0#11 0x00007fe20092d8ed in ?? () from /usr/lib/libvirt.so.0#12 0x00007fe200930dbd in virSecurityManagerTransactionCommit () from /usr/lib/libvirt.so.0#13 0x00007fe1ff1adad0 in qemuSecurityDomainSetPathLabel () from /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so#14 0x00007fe1ff197810 in ?? () from /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so#15 0x00007fe2009fbfda in virDomainScreenshot () from /usr/lib/libvirt.so.0#16 0x0000563c9047f1b8 in ?? ()#17 0x00007fe200951554 in virNetServerProgramDispatch () from /usr/lib/libvirt.so.0#18 0x00007fe200955a41 in ?? () from /usr/lib/libvirt.so.0#19 0x00007fe20089b74c in ?? () from /usr/lib/libvirt.so.0#20 0x00007fe20089b0cb in ?? () from /usr/lib/libvirt.so.0#21 0x00007fe200bba7b7 in start (p=0x7fe1ff738858) at src/thread/pthread_create.c:195#22 0x00007fe200bbc8f0 in __clone () at src/thread/x86_64/clone.s:22Backtrace stopped: frame did not save the PC(gdb)
I had another occurrence last night while doing VM backups.
This time the libvirt code causing the fork and lock seems to be different than the previous one I posted (.
Please find below gbd output:
GNU gdb (GDB) 9.2Copyright (C) 2020 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details.This GDB was configured as "x86_64-alpine-linux-musl".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at:<http://www.gnu.org/software/gdb/documentation/>.For help, type "help".Type "apropos word" to search for commands related to "word".Attaching to process 2229Reading symbols from /usr/sbin/libvirtd...(No debugging symbols found in /usr/sbin/libvirtd)Reading symbols from /usr/lib/libvirt-lxc.so.0...(No debugging symbols found in /usr/lib/libvirt-lxc.so.0)Reading symbols from /usr/lib/libvirt-qemu.so.0...(No debugging symbols found in /usr/lib/libvirt-qemu.so.0)--Type <RET> for more, q to quit, c to continue without paging--cReading symbols from /usr/lib/libvirt.so.0...(No debugging symbols found in /usr/lib/libvirt.so.0)Reading symbols from /usr/lib/libtirpc.so.3...(No debugging symbols found in /usr/lib/libtirpc.so.3)Reading symbols from /usr/lib/libdbus-1.so.3...(No debugging symbols found in /usr/lib/libdbus-1.so.3)Reading symbols from /usr/lib/libgobject-2.0.so.0...(No debugging symbols found in /usr/lib/libgobject-2.0.so.0)Reading symbols from /usr/lib/libglib-2.0.so.0...(No debugging symbols found in /usr/lib/libglib-2.0.so.0)Reading symbols from /usr/lib/libintl.so.8...(No debugging symbols found in /usr/lib/libintl.so.8)Reading symbols from /usr/lib/libgcc_s.so.1...(No debugging symbols found in /usr/lib/libgcc_s.so.1)Reading symbols from /lib/ld-musl-x86_64.so.1...Reading symbols from /usr/lib/debug//lib/ld-musl-x86_64.so.1.debug...Reading symbols from /usr/lib/libcap-ng.so.0...(No debugging symbols found in /usr/lib/libcap-ng.so.0)Reading symbols from /usr/lib/libyajl.so.2...(No debugging symbols found in /usr/lib/libyajl.so.2)Reading symbols from /usr/lib/libnl-3.so.200...(No debugging symbols found in /usr/lib/libnl-3.so.200)Reading symbols from /usr/lib/libxml2.so.2...(No debugging symbols found in /usr/lib/libxml2.so.2)Reading symbols from /usr/lib/libgio-2.0.so.0...(No debugging symbols found in /usr/lib/libgio-2.0.so.0)Reading symbols from /usr/lib/libsasl2.so.3...(No debugging symbols found in /usr/lib/libsasl2.so.3)Reading symbols from /usr/lib/libgnutls.so.30...(No debugging symbols found in /usr/lib/libgnutls.so.30)Reading symbols from /usr/lib/libcurl.so.4...(No debugging symbols found in /usr/lib/libcurl.so.4)Reading symbols from /usr/lib/libgssapi_krb5.so.2...(No debugging symbols found in /usr/lib/libgssapi_krb5.so.2)Reading symbols from /usr/lib/libffi.so.7...(No debugging symbols found in /usr/lib/libffi.so.7)Reading symbols from /usr/lib/libpcre.so.1...(No debugging symbols found in /usr/lib/libpcre.so.1)Reading symbols from /lib/libz.so.1...(No debugging symbols found in /lib/libz.so.1)Reading symbols from /usr/lib/liblzma.so.5...(No debugging symbols found in /usr/lib/liblzma.so.5)Reading symbols from /usr/lib/libgmodule-2.0.so.0...(No debugging symbols found in /usr/lib/libgmodule-2.0.so.0)Reading symbols from /lib/libmount.so.1...(No debugging symbols found in /lib/libmount.so.1)Reading symbols from /usr/lib/libp11-kit.so.0...(No debugging symbols found in /usr/lib/libp11-kit.so.0)Reading symbols from /usr/lib/libunistring.so.2...(No debugging symbols found in /usr/lib/libunistring.so.2)Reading symbols from /usr/lib/libtasn1.so.6...(No debugging symbols found in /usr/lib/libtasn1.so.6)Reading symbols from /usr/lib/libnettle.so.7...(No debugging symbols found in /usr/lib/libnettle.so.7)Reading symbols from /usr/lib/libhogweed.so.5...(No debugging symbols found in /usr/lib/libhogweed.so.5)Reading symbols from /usr/lib/libgmp.so.10...(No debugging symbols found in /usr/lib/libgmp.so.10)Reading symbols from /usr/lib/libnghttp2.so.14...(No debugging symbols found in /usr/lib/libnghttp2.so.14)Reading symbols from /lib/libssl.so.1.1...(No debugging symbols found in /lib/libssl.so.1.1)Reading symbols from /lib/libcrypto.so.1.1...(No debugging symbols found in /lib/libcrypto.so.1.1)Reading symbols from /usr/lib/libkrb5.so.3...(No debugging symbols found in /usr/lib/libkrb5.so.3)Reading symbols from /usr/lib/libk5crypto.so.3...(No debugging symbols found in /usr/lib/libk5crypto.so.3)Reading symbols from /lib/libcom_err.so.2...(No debugging symbols found in /lib/libcom_err.so.2)Reading symbols from /usr/lib/libkrb5support.so.0...(No debugging symbols found in /usr/lib/libkrb5support.so.0)Reading symbols from /lib/libblkid.so.1...(No debugging symbols found in /lib/libblkid.so.1)Reading symbols from /usr/lib/libkeyutils.so.1...(No debugging symbols found in /usr/lib/libkeyutils.so.1)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_network.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_network.so)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_interface.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_interface.so)Reading symbols from /usr/lib/libnetcf.so.1...(No debugging symbols found in /usr/lib/libnetcf.so.1)Reading symbols from /lib/libudev.so.1...(No debugging symbols found in /lib/libudev.so.1)Reading symbols from /usr/lib/libaugeas.so.0...(No debugging symbols found in /usr/lib/libaugeas.so.0)Reading symbols from /usr/lib/libexslt.so.0...(No debugging symbols found in /usr/lib/libexslt.so.0)Reading symbols from /usr/lib/libxslt.so.1...(No debugging symbols found in /usr/lib/libxslt.so.1)Reading symbols from /usr/lib/libnl-route-3.so.200...(No debugging symbols found in /usr/lib/libnl-route-3.so.200)Reading symbols from /usr/lib/libfa.so.1...(No debugging symbols found in /usr/lib/libfa.so.1)Reading symbols from /usr/lib/libgcrypt.so.20...(No debugging symbols found in /usr/lib/libgcrypt.so.20)Reading symbols from /usr/lib/libgpg-error.so.0...(No debugging symbols found in /usr/lib/libgpg-error.so.0)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_secret.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_secret.so)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_fs.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_fs.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_logical.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_logical.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_scsi.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_scsi.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_mpath.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_mpath.so)Reading symbols from /lib/libdevmapper.so.1.02...(No debugging symbols found in /lib/libdevmapper.so.1.02)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_disk.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_disk.so)Reading symbols from /usr/lib/libvirt/storage-backend/libvirt_storage_backend_zfs.so...(No debugging symbols found in /usr/lib/libvirt/storage-backend/libvirt_storage_backend_zfs.so)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_nodedev.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_nodedev.so)Reading symbols from /usr/lib/libpciaccess.so.0...(No debugging symbols found in /usr/lib/libpciaccess.so.0)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_nwfilter.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_nwfilter.so)Reading symbols from /usr/lib/libpcap.so.1...(No debugging symbols found in /usr/lib/libpcap.so.1)Reading symbols from /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so...(No debugging symbols found in /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so)Reading symbols from /usr/lib/libvirt/storage-file/libvirt_storage_file_fs.so...(No debugging symbols found in /usr/lib/libvirt/storage-file/libvirt_storage_file_fs.so)__wait (addr=addr@entry=0x7fe200bfbaa0 <mal+32>, waiters=waiters@entry=0x7fe200bfbaa4 <mal+36>, val=val@entry=1, priv=128, priv@entry=1) at ./arch/x86_64/syscall_arch.h:4040 ./arch/x86_64/syscall_arch.h: No such file or directory.(gdb) backtrace#0 __wait (addr=addr@entry=0x7fe200bfbaa0 <mal+32>, waiters=waiters@entry=0x7fe200bfbaa4 <mal+36>, val=val@entry=1, priv=128, priv@entry=1) at ./arch/x86_64/syscall_arch.h:40#1 0x00007fe200b8c3ca in lock (lk=0x7fe200bfbaa0 <mal+32>) at src/malloc/malloc.c:31#2 lock_bin (i=1) at src/malloc/malloc.c:46#3 malloc (n=<optimized out>, n@entry=24) at src/malloc/malloc.c:320#4 0x00007fe200b8c584 in calloc (m=<optimized out>, n=24) at src/malloc/malloc.c:361#5 0x00007fe20062212e in g_malloc0 () from /usr/lib/libglib-2.0.so.0#6 0x00007fe20082c71f in virAlloc () from /usr/lib/libvirt.so.0#7 0x00007fe200932517 in ?? () from /usr/lib/libvirt.so.0#8 0x00007fe20092e281 in ?? () from /usr/lib/libvirt.so.0#9 0x00007fe200882704 in virProcessRunInFork () from /usr/lib/libvirt.so.0#10 0x00007fe20092e20e in ?? () from /usr/lib/libvirt.so.0#11 0x00007fe20093104e in virSecurityManagerMoveImageMetadata () from /usr/lib/libvirt.so.0#12 0x00007fe20092d6fb in ?? () from /usr/lib/libvirt.so.0#13 0x00007fe20093104e in virSecurityManagerMoveImageMetadata () from /usr/lib/libvirt.so.0#14 0x00007fe1ff1a8b54 in ?? () from /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so#15 0x00007fe200a16732 in virDomainSnapshotCreateXML () from /usr/lib/libvirt.so.0#16 0x0000563c90480abc in ?? ()#17 0x00007fe200951554 in virNetServerProgramDispatch () from /usr/lib/libvirt.so.0#18 0x00007fe200955a41 in ?? () from /usr/lib/libvirt.so.0#19 0x00007fe20089b74c in ?? () from /usr/lib/libvirt.so.0#20 0x00007fe20089b0cb in ?? () from /usr/lib/libvirt.so.0--Type <RET> for more, q to quit, c to continue without paging--c#21 0x00007fe200bba7b7 in start (p=0x7fe1ff6cf858) at src/thread/pthread_create.c:195#22 0x00007fe200bbc8f0 in __clone () at src/thread/x86_64/clone.s:22Backtrace stopped: frame did not save the PC(gdb)
(gdb) set debug-file-directory /usr/lib/debug(gdb) bt#0 __wait (addr=addr@entry=0x7fe200bfba88 <mal+8>, waiters=waiters@entry=0x7fe200bfba8c <mal+12>, val=val@entry=1, priv=128, priv@entry=1) at ./arch/x86_64/syscall_arch.h:40#1 0x00007fe200b8c3ca in lock (lk=0x7fe200bfba88 <mal+8>) at src/malloc/malloc.c:31#2 lock_bin (i=0) at src/malloc/malloc.c:46#3 malloc (n=<optimized out>, n@entry=8) at src/malloc/malloc.c:320#4 0x00007fe200b8c584 in calloc (m=<optimized out>, n=8) at src/malloc/malloc.c:361#5 0x00007fe20062212e in g_malloc0 () from /usr/lib/libglib-2.0.so.0#6 0x00007fe20082c732 in virAllocN () from /usr/lib/libvirt.so.0#7 0x00007fe20092fa50 in ?? () from /usr/lib/libvirt.so.0#8 0x00007fe200882704 in virProcessRunInFork () from /usr/lib/libvirt.so.0#9 0x00007fe20092fd22 in ?? () from /usr/lib/libvirt.so.0#10 0x00007fe200930dbd in virSecurityManagerTransactionCommit () from /usr/lib/libvirt.so.0#11 0x00007fe20092d8ed in ?? () from /usr/lib/libvirt.so.0#12 0x00007fe200930dbd in virSecurityManagerTransactionCommit () from /usr/lib/libvirt.so.0#13 0x00007fe1ff1adad0 in qemuSecurityDomainSetPathLabel () from /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so#14 0x00007fe1ff197810 in ?? () from /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so#15 0x00007fe2009fbfda in virDomainScreenshot () from /usr/lib/libvirt.so.0#16 0x0000563c9047f1b8 in ?? ()#17 0x00007fe200951554 in virNetServerProgramDispatch () from /usr/lib/libvirt.so.0#18 0x00007fe200955a41 in ?? () from /usr/lib/libvirt.so.0#19 0x00007fe20089b74c in ?? () from /usr/lib/libvirt.so.0#20 0x00007fe20089b0cb in ?? () from /usr/lib/libvirt.so.0#21 0x00007fe200bba7b7 in start (p=0x7fe1fee81810) at src/thread/pthread_create.c:195#22 0x00007fe200bbc8f0 in __clone () at src/thread/x86_64/clone.s:22Backtrace stopped: frame did not save the PC(gdb)
I wish I would be more profficient with gdb, sorry.
I wasn't sure if I should post this in Alpine forums or in the issue but after reviewing upstream issues 52 and 54 it's very clear that upstream is not planning nor interested in making libvirt compatible with musl.
So moving forward and looking into a permanent, maintainable solution what should Alpine do (in no specific order)?
Option 1: Users (like me) can keep providing backtraces everytime we find this fork & dealock issue (which seems to happen in quite a few places -snapshots, screenshots, etc-) and developers can keep writing patches for libvirt (whenever possible). The problem with this strategy is that who knows in how many places libvirt code needs to be patched to be compatible with musl and moving forward how difficult will be to maintain all these patches.
Option 2: Musl (and don't shoot me please) changes its malloc code to be less strict in a similar way glibc seems to behave, so no patches to libvirt are needed.
Option 3: not even sure this is an option at all (sorry for my lack of expertise here) have glibc in Alpine and compile software that is not compatible with musl against glibc? I know that this may be impossible so I also thought about running libvirtd in a Debian docker container, but then I'm not sure how this setup will be able to interact with KVM and Qemu on the host node.
Sorry if I'm missing any other options I'm just thinking out loud and looking for your thoughts on this issue.
Another thing I'm still wondering is what changed between Alpine 3.12 and 3.11.6 that caused this issue, if it was libvirt code changes or musl code changes. But this is more out of curiosity since it will make no difference to the fact that there is an issue that needs to be solved in some way.
Option 2: Musl (and don't shoot me please) changes its malloc code to be less strict in a similar way glibc seems to behave, so no patches to libvirt are needed.
I think this was suggested to musl upstream and there's a new POSIX revision in the making that will make calling malloc() between fork() and exec() an OK thing to do. I think musl upstream wants to wait with implementing that until the spec is out though? (CC @dalias)
Option3 isn't really something we can do as a distro, since a package and all of its dependants need to be built against the same libc. If we were to build libvirt against glibc but didn't build all of its dependants against glibc it'd blow up during startup.
Sorry if I'm missing any other options I'm just thinking out loud and looking for your thoughts on this issue. Another thing I'm still wondering is what changed between Alpine 3.12 and 3.11.6 that caused this issue, if it was libvirt code changes or musl code changes
Apparently newer musl versions are pretty good at triggering this bug - especially musl >=1.2.1 triggers this often since it introduced a new malloc (mallocng) which triggers the bug often.
I think this was suggested to musl upstream and there's a new POSIX revision in the making that will make calling malloc() between fork() and exec() an OK thing to do.
That's not the case. What the change does is make what glibc's doing (breaking the async-signal-safety of fork in order to support calling malloc between fork and exec) no longer non-conforming. It's a change that relaxes the requirements on implementations. It does not relax any requirements (add any new allowances) for applications. Calling AS-unsafe functions between fork and exec is still UB. The only difference is that implementations have the flexibility to define it themselves in a meaningful way if they want.
Note that it turns out that glibc does not actually make this work; see https://www.openwall.com/lists/musl/2020/08/16/2. They only make malloc work, nothing else. There are still all sorts of other ways to shoot yourself in the foot doing AS-unsafe things between fork and exec.
I don't see why this issue keeps coming up for libvirt. I'm pretty sure the original proposed patch I posted killed all possible vectors for malloc from the child, but then the patches that were actually used were a lot less drastic and unclear whether they left open possibilities. While there's also a larger issue at hand here about the entire software ecosystem, the libvirt one is completely tractable just by analyzing what was done wrong.