Skip to content

Commit 851ab7b

Browse files
Chenguang ZhaoNipaLocal
authored andcommitted
net/mlx5: Flag state up only after cmdif is ready
When driver is reloading during recovery flow, it can't get new commands till command interface is up again. Otherwise we may get to null pointer trying to access non initialized command structures. The issue can be reproduced using the following script: 1)Use following script to trigger PCI error. for((i=1;i<1000;i++)); do echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset echo “pci reset test $i times” done 2) Use following script to read speed. while true; do cat /sys/class/net/eth0/speed &> /dev/null; done task: ffff885f42820fd0 ti: ffff88603f758000 task.ti: ffff88603f758000 RIP: 0010:[] [] dma_pool_alloc+0x1ab/0×290 RSP: 0018:ffff88603f75baf0 EFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff882f77d90c80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000000000080d0 RDI: ffff882f77d90d10 RBP: ffff88603f75bb20 R08: 0000000000019ba0 R09: ffff88017fc07c00 R10: ffffffffc0a9c384 R11: 0000000000000246 R12: ffff882f77d90d00 R13: 00000000000080d0 R14: ffff882f77d90d10 R15: ffff88340b6c5ea8 FS: 00007efce8330740(0000) GS:ffff885f4da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000003454fc6000 CR4: 00000000003407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call trace: mlx5_alloc_cmd_msg+0xb4/0×2a0 [mlx5_core] mlx5_alloc_cmd_msg+0xd3/0×2a0 [mlx5_core] cmd_exec+0xcf/0×8a0 [mlx5_core] mlx5_cmd_exec+0x33/0×50 [mlx5_core] mlx5_core_access_reg+0xf1/0×170 [mlx5_core] mlx5_query_port_ptys+0x64/0×70 [mlx5_core] mlx5e_get_link_ksettings+0x5c/0×360 [mlx5_core] __ethtool_get_link_ksettings+0xa6/0×210 speed_show+0x78/0xb0 dev_attr_show+0x23/0×60 sysfs_read_file+0x99/0×190 vfs_read+0x9f/0×170 SyS_read+0x7f/0xe0 tracesys+0xe3/0xe8 Fixes: a80d1b6 ("net/mlx5: Break load_one into three stages") Signed-off-by: Chenguang Zhao <[email protected]> Signed-off-by: NipaLocal <nipa@local>
1 parent ccd5135 commit 851ab7b

File tree

1 file changed

+3
-2
lines changed
  • drivers/net/ethernet/mellanox/mlx5/core

1 file changed

+3
-2
lines changed

drivers/net/ethernet/mellanox/mlx5/core/main.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1210,6 +1210,9 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou
12101210
dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev);
12111211
mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP);
12121212

1213+
/* remove any previous indication of internal error */
1214+
dev->state = MLX5_DEVICE_STATE_UP;
1215+
12131216
err = mlx5_core_enable_hca(dev, 0);
12141217
if (err) {
12151218
mlx5_core_err(dev, "enable hca failed\n");
@@ -1602,8 +1605,6 @@ int mlx5_load_one_devl_locked(struct mlx5_core_dev *dev, bool recovery)
16021605
mlx5_core_warn(dev, "interface is up, NOP\n");
16031606
goto out;
16041607
}
1605-
/* remove any previous indication of internal error */
1606-
dev->state = MLX5_DEVICE_STATE_UP;
16071608

16081609
if (recovery)
16091610
timeout = mlx5_tout_ms(dev, FW_PRE_INIT_ON_RECOVERY_TIMEOUT);

0 commit comments

Comments
 (0)