freeze of OVS after some (short) time on Supermicro X10DRi/X10DRi
Hi,
I hope to report this strange bug properly…
On a brand new box reported by dmesg as Supermicro X10DRi/X10DRi, BIOS 1.0b 09/17/2014 (2 Xeon with 8 cores and hyperthreading (linux report 32 cores) and 64 GB), running from usb KEY with no disks (yet), when using OpenVSwitch on eth0, OVS freeze after ~400 seconds. Only reboot get the interface usable again even if “tcpdump -n -i eth0” while freezed shows traffic. I still have acces to the box trough eth1 which is not connected to a vswitch.
dmseg reports
[ 11.041665] ------------[ cut here ]------------
[ 11.041677] WARNING: CPU: 0 PID: 2200 at /home/buildozer/aports/main/linux-grsec/src/linux-3.14/drivers/dma/ioat/dca.c:697 ioat3_dca_init+0x16c/0x1a1 [ioatdma]()
[ 11.041679] ioatdma 0000:00:04.0: APICID_TAG_MAP set incorrectly by BIOS, disabling DCA
[ 11.041679] Modules linked in: ioatdma(+) fbcon font igb bitblit fbcon_rotate fbcon_ccw fbcon_ud fbcon_cw softcursor tileblit ptp pps_core dca ast drm_kms_helper ttm drm agpgart fb fbdev syscopyarea sysfillrect sysimgblt i2c_algo_bit i2c_core shpchp mousedev joydev evdev hed tpm_tis tpm wmi button processor ipmi_si ipmi_msghandler acpi_power_meter hwmon isofs nls_utf8 nls_cp437 hid_generic usbhid hid vfat fat xhci_hcd ahci libahci libata usb_storage sd_mod scsi_mod crc_t10dif crct10dif_common squashfs loop
[ 11.041703] CPU: 0 PID: 2200 Comm: modprobe Not tainted 3.14.22-1-grsec #2-Alpine
[ 11.041704] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.0b 09/17/2014
[ 11.041706] 00000000124b3b78 ffffc900124b3ab8 ffffffff9934cfa8 ffffc900124b3b00
[ 11.041709] ffffc900124b3af0 ffffffff9903e9ef ffffffffa03ccec0 ffff8810370e6480
[ 11.041713] 0000000000000002 ffff881037ae5800 ffffc90013930100 ffffc900124b3b50
[ 11.041716] Call Trace:
[ 11.041722] [<ffffffff9934cfa8>] dump_stack+0x45/0x56
[ 11.041728] [<ffffffff9903e9ef>] warn_slowpath_common+0x75/0x8e
[ 11.041733] [<ffffffffa03ccec0>] ? ioat3_dca_init+0x16c/0x1a1 [ioatdma]
[ 11.041736] [<ffffffff9903ea90>] warn_slowpath_fmt_taint+0x3f/0x41
[ 11.041739] [<ffffffffa03cdabb>] ? .LC16+0x97/0x124 [ioatdma]
[ 11.041743] [<ffffffffa03cd227>] ? xor_idx_to_field+0x27/0xc5 [ioatdma]
[ 11.041747] [<ffffffffa03ccec0>] ioat3_dca_init+0x16c/0x1a1 [ioatdma]
[ 11.041750] [<ffffffffa03cc573>] ioat3_dma_probe+0x299/0x33d [ioatdma]
[ 11.041758] [<ffffffff991b8e33>] ? __pci_set_master+0x24/0x6f
[ 11.041762] [<ffffffffa03c71b4>] ioat_pci_probe+0x14d/0x174 [ioatdma]
[ 11.041765] [<ffffffff991bcd1d>] pci_device_probe+0x54/0xa3
[ 11.041770] [<ffffffff9923cb4a>] driver_probe_device+0xa4/0x1ca
[ 11.041772] [<ffffffff9923cd00>] __driver_attach+0x58/0x7a
[ 11.041786] [<ffffffff9923cca8>] ? __device_attach+0x38/0x38
[ 11.041792] [<ffffffff9923b31e>] bus_for_each_dev+0x78/0x82
[ 11.041794] [<ffffffff9923c6e8>] driver_attach+0x19/0x1b
[ 11.041796] [<ffffffff9923c3b1>] bus_add_driver+0x101/0x1cb
[ 11.041799] [<ffffffff9923d23e>] driver_register+0x89/0xc5
[ 11.041801] [<ffffffffa03d1000>] ? 0xffffffffa03d0fff
[ 11.041804] [<ffffffff991bc6a6>] __pci_register_driver+0x46/0x48
[ 11.041807] [<ffffffffa03d1089>] ioat_init_module+0x89/0x3c5d [ioatdma]
[ 11.041809] [<ffffffffa03d1000>] ? 0xffffffffa03d0fff
[ 11.041812] [<ffffffff990020d7>] do_one_initcall+0x7b/0xfd
[ 11.041817] [<ffffffff99093afb>] load_module+0x1672/0x1c8d
[ 11.041820] [<ffffffff990911d5>] ? store_uevent+0x35/0x35
[ 11.041822] [<ffffffffa03d10c0>] ? ioat_init_module+0xc0/0x3c5d [ioatdma]
[ 11.041826] [<ffffffffa03cdd60>] ? __kstrtab_ioat_dma_setup_interrupts+0x30/0x30 [ioatdma]
[ 11.041831] [<ffffffff990f1d5f>] ? __check_object_size+0x7d/0x1fa
[ 11.041834] [<ffffffff99094282>] SyS_init_module+0x16c/0x17d
[ 11.041839] [<ffffffff99357b75>] system_call_fastpath+0x16/0x1b
[ 11.041841] ---[ end trace 35ce923f562dbc4a ]---
So I did add module ioatdma in /etc/modprobe.d/blacklist.conf
After reboot, the freeze occurs after ~2500 seconds.
Those mesurements are consistent accross several reboots.
Removing eth0 from the vswitch with “ovs-vsctl del-port vswitch0 eth0” did not return till Ctrl-C.
Without OVS, there is no freeze.
With bridge, there is no freeze.
I use the exact same setup (the USB key are cloned with dd) on another Supermicro box, but older, with no problem at all.
Unfortunately I do not have physical access to the boxes as there are 1000 km away…
(from redmine: issue id 3688, created on 2015-01-09, closed on 2019-06-11)