Equinix usa.t1.alpinelinux.org fails to boot with linux-lts 6.6.28
When upgrading usa.t1.alpinelinux.org to Alpine 3.9, which has linux 6.6.28, it crashes on boot:
unable to handle page fault for address: 0000000000001118
Full stacktrace
* Loading hardware drivers ...[ 17.826580] BUG: unable to handle page fault for address: 0000000000001118
[ 17.834437] #PF: supervisor read access in kernel mode
[ 17.840496] #PF: error_code(0x0000) - not-present page
[ 17.846523] PGD 0 P4D 0
[ 17.849932] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 17.855160] CPU: 12 PID: 92 Comm: kworker/12:0 Not tainted 6.6.28-0-lts #1-Alpine
[ 17.863517] Hardware name: Supermicro SSG-6029P-E1CR12L-PH004/X11DPH-T, BIOS 3.5 05/19/2021
[ 17.872749] Workqueue: events work_for_cpu_fn
[ 17.877996] RIP: 0010:esw_port_metadata_get+0x19/0x30 [mlx5_core]
[ 17.885291] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 d3 e8 4e ea ad d3 48 8b 80 b0 09 00 00 <8b> 80 18 11 00 00 88 03 31 c0 80 23 01 5b 31 d2 31 ff c3 cc cc cc
[ 17.905904] RSP: 0018:ffffb32d88953ba8 EFLAGS: 00010246
[ 17.912069] RAX: 0000000000000000 RBX: ffffb32d88953bfc RCX: 0000000000000000
[ 17.920148] RDX: ffffb32d88953bfc RSI: 0000000000000013 RDI: 0000000000000000
[ 17.928226] RBP: ffffb32d88953c20 R08: 0000000000000000 R09: 0000000000000000
[ 17.936300] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc108e5c0
[ 17.944379] R13: ffff8e0bd1b8ff60 R14: ffff8e0bc7158000 R15: 0000000000000000
[ 17.952464] FS: 0000000000000000(0000) GS:ffff8e235f800000(0000) knlGS:0000000000000000
[ 17.961511] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 17.968223] CR2: 0000000000001118 CR3: 000000254682e003 CR4: 00000000007706e0
[ 17.976329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 17.984429] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 17.992522] PKRU: 55555554
[ 17.996170] Call Trace:
[ 17.999554] <TASK>
[ 18.002583] ? __die+0x23/0x80
[ 18.006560] ? page_fault_oops+0x171/0x4f0
[ 18.011573] ? exc_page_fault+0x7f/0x190
[ 18.016400] ? asm_exc_page_fault+0x26/0x30
[ 18.021504] ? esw_port_metadata_get+0x19/0x30 [mlx5_core]
[ 18.028193] ? esw_port_metadata_get+0x12/0x30 [mlx5_core]
[ 18.034843] devlink_nl_param_fill.constprop.0+0xcd/0x640
[ 18.041134] ? __alloc_skb+0x8c/0x1b0
[ 18.045681] devlink_param_notify.constprop.0+0x82/0xe0
[ 18.051793] devl_params_register+0x129/0x220
[ 18.057034] esw_offloads_init+0x171/0x190 [mlx5_core]
[ 18.063321] mlx5_eswitch_init+0x3ae/0x640 [mlx5_core]
[ 18.069593] mlx5_init_one_devl_locked+0x17b/0x610 [mlx5_core]
[ 18.076479] probe_one+0x333/0x500 [mlx5_core]
[ 18.081952] local_pci_probe+0x42/0xa0
[ 18.086509] work_for_cpu_fn+0x17/0x30
[ 18.091039] process_one_work+0x171/0x340
[ 18.095810] worker_thread+0x28c/0x3b0
[ 18.100294] ? __pfx_worker_thread+0x10/0x10
[ 18.105283] kthread+0xe5/0x120
[ 18.109115] ? __pfx_kthread+0x10/0x10
[ 18.113539] ret_from_fork+0x31/0x50
[ 18.117789] ? __pfx_kthread+0x10/0x10
[ 18.122198] ret_from_fork_asm+0x1b/0x30
[ 18.126777] </TASK>
[ 18.129607] Modules linked in: mlx5_core(+) pci_hyperv_intf input_leds evdev joydev mousedev intel_rapl_msr coretemp intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl intel_cstate hed tpm_crb tpm_tis tpm_tis_core tpm rng_core ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad hid_generic usbhid hid ses enclosure button crc32_pclmul i2c_i801 i2c_smbus xhci_pci xhci_pci_renesas xhci_hcd usbcore usb_common ast i2c_algo_bit mpt3sas raid_class scsi_transport_sas nvme nvme_core nvme_common hwmon ahci libahci libata wmi simpledrm drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks loop sd_mod t10_pi crc64_rocksoft crc64 scsi_mod scsi_common ext4 crc32c_generic crc32c_intel crc16 mbcache jbd2
[ 18.217785] CR2: 0000000000001118
[ 18.221806] ---[ end trace 0000000000000000 ]---
[ 18.285368] RIP: 0010:esw_port_metadata_get+0x19/0x30 [mlx5_core]
[ 18.292485] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 d3 e8 4e ea ad d3 48 8b 80 b0 09 00 00 <8b> 80 18 11 00 00 88 03 31 c0 80 23 01 5b 31 d2 31 ff c3 cc cc cc
[ 18.312742] RSP: 0018:ffffb32d88953ba8 EFLAGS: 00010246
[ 18.318732] RAX: 0000000000000000 RBX: ffffb32d88953bfc RCX: 0000000000000000
[ 18.326629] RDX: ffffb32d88953bfc RSI: 0000000000000013 RDI: 0000000000000000
[ 18.334517] RBP: ffffb32d88953c20 R08: 0000000000000000 R09: 0000000000000000
[ 18.342396] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc108e5c0
[ 18.350274] R13: ffff8e0bd1b8ff60 R14: ffff8e0bc7158000 R15: 0000000000000000
[ 18.358161] FS: 0000000000000000(0000) GS:ffff8e235f800000(0000) knlGS:0000000000000000
[ 18.367021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 18.373529] CR2: 0000000000001118 CR3: 000000254682e003 CR4: 00000000007706e0
[ 18.381440] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 18.389358] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 18.397270] PKRU: 55555554
[ 18.400761] note: kworker/12:0[92] exited with irqs disabled
Downgrading the kernel to 6.1.87 from Alpine 3.18 allows it to boot.
This is a bare-metal server from equinix (s3.xlarge.x86).