(random) Linux Kernel Panic when using Alpine Linux as Xen Hypervisor
I have a Problem since now about 2 years with Alpine Linux used as Xen Hypervisor. Randomly the Dom0 Kernel Crashes with the below found backtrace. This issue happens when HVM VMS are running between 2-4 Weeks under “normal” operation. The VM types range from various Linux Distributions and Windows Versions.
The Hardware Platform is an Asrock Board with Quad-Core Celeron CPU, 4GB-16GB RAM and an SSD or mSATA SSD. The Alpine Linux Version ranges from 3.2 to 3.5 all having the same Kernel panic after a while.
After about a year we figured that pushing I/O load up triggers the Panic much faster especially when using qcow2 as VM Disk format (about 5-30 Minutes). The recipe is: Start fio with 5 Jobs on the Alpine hypervisor. Start 2 VMS running Alpine Linux and start fio with 5 jobs inside the VMs as well.
the used fio command line is:
- fio —name=test —rw=randrw —size=100M —numjobs=5 —time_based —runtime=8h —direct=1 —alloc-size=4096
NOTE: when using LVM as Disk storage the issue only happens after several hours of running fio, sometimes even the 8h test can be finished successfull, but still it can happen at the following test, so it’s easier to use qcow2 in order to reproduce the Problem faster.
NOTE2: when we use Citrix Xenserver on the exact same hardware with the same recipes, we cannot trigger this kernel panic, that’s why I opened the Bug here at Alpine and not at the Xen or at the Linux Kernel Bugtrackers.
NOTE3: I tried both the grsec and vanilla Kernels and it makes no difference, even with alpine 3.2 we used to use a linux version 3 kernel, now we’re at version 4.4, still it happens.
Next thing that i want to do is to recompile my own kernel, unfortunately the Alpine Howto on Making a kernel Package is not very complete, so i hope the way is not to difficult.
Please help me in resolving this issue, I would love to use Alpine Linux, as it’s a really nice Distribution, but these stability Problems might force me to change to XenServer in the future :(
The Kernel Panic has the following backtrace:
Dec 13 16:17:07 beronet-hv kern.err kernel: [ 311.062755] list_add
corruption. next->prev should be prev (ffff880188d147f8), but was
ffffc9000d5a3cf0. (next=ffffea0005daef20).
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.062825] ——————[
cut here ]——————
Dec 13 16:17:07 beronet-hv kern.crit kernel: [ 311.067020] kernel BUG
at
/home/buildozer/aports/main/linux-grsec/src/linux-4.4/lib/list_debug.c:32!
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.071703] invalid
opcode: 0000 [#1] SMP
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.075062] Modules
linked in: xt_physdev br_netfilter iptable_filter ip_tables
x_tables bridge stp llc ipv6 dm_mod nbd xenfs xen_privcmd xen_evtchn
xen_gntdev xen_gntalloc xen_blkback xen_netback tun af_packet
snd_hda_codec_
Dec 13 16:17:07 beronet-hv kern.info kernel: m i2c_designware_core
snd_intel_sst_acpi snd_intel_sst_core
snd_soc_sst_mfld_platform pwm_lpss_platform pwm_lpss
snd_soc_rt5670 snd_soc_core snd_pcm snd_timer snd_compress snd
soundcore snd_soc_rl6231 regmap_i2c i2c_core xhci_pc
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.115546] CPU: 2 PID:
25492 Comm: qemu-system-i38 Not tainted 4.4.68-0-grsec #1-Alpine
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.119605] Hardware
name: To Be Filled By O.E.M. To Be Filled By O.E.M./IMB-155, BIOS P1.80
03/22/2017
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.123715] task:
ffff88017ef298c0 ti: ffff88017ef2b140 task.ti: ffff88017ef2b140
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.127818] RIP:
e030:[] []
__list_add_debug+0x1f/0x62
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.131966] RSP:
e02b:ffffc9000d5a3c20 EFLAGS: 00010082
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.136053] RAX:
0000000000000075 RBX: ffffea0005daef20 RCX: 0000000000000000
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.140135] RDX:
0000000000000000 RSI: 0000000000000000 RDI: ffff880188d0c888
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.144156] RBP:
ffffc9000d5a3c20 R08: ffffffff815c4d01 R09: 0000000000000002
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.148145] R10:
0000000000000000 R11: ffffffff818580ad R12: ffff880188d147f8
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.152118] R13:
ffffea0005daef20 R14: ffffea0005daef00 R15: 0000000000000201
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.156113] FS:
00006e3d7da15ab0(0000) GS:ffff880188d00000(0000)
knlGS:0000000000000000
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.160184] CS: e033
DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.164652] CR2:
00006e3d8007a908 CR3: 000000017774b000 CR4: 0000000000002660
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.168674] Stack:
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.172724]
ffffc9000d5a3c48 ffffffff81209b02 0000000000000002 ffff880188d147d8
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.176909]
0000000000000000 ffffc9000d5a3c58 ffffffff81209b25 ffffc9000d5a3c90
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.181130]
ffffffff810e2e78 ffffc9000d5a3cd0 0000000000000000 ffffc9000d5a3cf0
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.185346] Call
Trace:
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.189565]
[] __pax_list_add+0x17/0x31
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.193849]
[] __list_add+0x9/0xb
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.199307]
[] free_hot_cold_page+0xbb/0x103
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.203613]
[] free_hot_cold_page_list+0x2f/0x43
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.207905]
[] release_pages+0x8a/0x221
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.212263]
[] ? lru_cache_add_file+0x16/0x16
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.216647]
[] pagevec_lru_move_fn+0xc9/0xdd
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.221061]
[] __pagevec_lru_add+0x12/0x14
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.225489]
[] lru_add_drain_cpu+0x26/0xa3
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.230005]
[] lru_add_drain+0x10/0x12
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.234454]
[] unmap_region+0x49/0x10a
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.238902]
[] ? do_futex+0xd7/0x957
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.243387]
[] ? xen_load_sp0+0x6a/0x7c
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.247876]
[] ? vma_rb_erase+0x1f4/0x248
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.252371]
[] do_munmap+0x28d/0x325
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.256858]
[] vm_munmap+0x3d/0x55
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.261490]
[] SyS_munmap+0x1e/0x24
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.266435]
[] entry_SYSCALL_64_fastpath+0x12/0x71
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.270734] Code: 89 7f
08 74 05 e8 55 68 e4 ff c9 c3 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2
74 11 48 89 c1 48 c7 c7 04 6a 5d 81 e8 42 30 ed ff <0f>0b 48 8b
32 48 39 f0 74 17 48 89 d1 48 c7 c7 9a 6a 5d 81 48
Dec 13 16:17:07 beronet-hv kern.alert kernel: [ 311.279725] RIP
[] __list_add_debug+0x1f/0x62
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.283937] RSP
Dec 13 16:17:07 beronet-hv kern.warn kernel: [ 311.287996] —[ end
trace c3b496f413cf7616 ]—
Dec 13 16:17:08 beronet-hv kern.err kernel: [ 311.339291] list_del
corruption, ffffea0005daef20->next is LIST_POISON1
(00000000ffffff02)
Dec 13 16:17:08 beronet-hv kern.warn kernel: [ 311.343374] ——————[
cut here ]——————
Dec 13 16:17:08 beronet-hv kern.crit kernel: [ 311.347235] kernel BUG
at
/home/buildozer/aports/main/linux-grsec/src/linux-4.4/lib/list_debug.c:75!
Dec 13 16:17:08 beronet-hv kern.warn kernel: [ 311.351122] invalid
opcode: 0000 [#2] SMP
Dec 13 16:17:08 beronet-hv kern.warn kernel: [ 311.354896] Modules
linked in: xt_physdev br_netfilter iptable_filter ip_tables
x_tables bridge stp llc ipv6 dm_mod nbd xenfs xen_privcmd xen_evtchn
xen_gntdev xen_gntalloc xen_blkback xen_netback tun af_packet
snd_hda_codec_
Dec 13 16:17:08 beronet-hv kern.info kernel: m i2c_designware_core
snd_intel_sst_acpi snd_intel_sst_core
snd_soc_sst_mfld_platform pwm_lpss_platform pwm_lpss
snd_soc_rt5670 snd_soc_core snd_pcm snd_timer snd_compress snd
soundcore snd_soc_rl6231 regmap_i2c i2c_core xhci_pc
(from redmine: issue id 8282, created on 2017-12-13)