<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/tools/testing/selftests/drivers/net/mlxsw/spectrum-2, branch linux-rolling-stable</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2025-06-27T00:53:07Z</updated>
<entry>
<title>selftests: forwarding: lib: Split setup_wait()</title>
<updated>2025-06-27T00:53:07Z</updated>
<author>
<name>Petr Machata</name>
<email>petrm@nvidia.com</email>
</author>
<published>2025-06-25T10:41:23Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b7863babce0a7aacb252e50d75b192476aa81ea1'/>
<id>urn:sha1:b7863babce0a7aacb252e50d75b192476aa81ea1</id>
<content type='text'>
setup_wait() takes an optional argument and then is called from the top
level of the test script. That confuses shellcheck, which thinks that maybe
the intention is to pass $1 of the script to the function, which is never
the case. To avoid having to annotate every single new test with a SC
disable, split the function in two: one that takes a mandatory argument,
and one that takes no argument at all.

Convert the two existing users of that optional argument, both in Spectrum
resource selftest, to use the new form. Clean up vxlan_bridge_1q_mc_ul.sh
to not pass a now-unused argument.

Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Link: https://patch.msgid.link/8e13123236fe3912ae29bc04a1528bdd8551da1f.1750847794.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors</title>
<updated>2024-06-10T10:14:52Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2024-06-06T14:49:43Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=75d8d7a63065b18df9555dbaab0b42d4c6f20943'/>
<id>urn:sha1:75d8d7a63065b18df9555dbaab0b42d4c6f20943</id>
<content type='text'>
ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and
newer ASICs can share the same mask if their masks only differ in up to
8 consecutive bits. For example, consider the following filters:

 # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop
 # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop

The second filter can use the same mask as the first (dst_ip/24) with a
delta of 1 bit.

However, the above only works because the two filters have different
values in the common unmasked part (dst_ip/24). When entries have the
same value in the common unmasked part they create undesired collisions
in the device since many entries now have the same key. This leads to
firmware errors such as [1] and to a reduced scale.

Fix by adjusting the hash table key to only include the value in the
common unmasked part. That is, without including the delta bits. That
way the driver will detect the collision during filter insertion and
spill the filter into the circuit TCAM (C-TCAM).

Add a test case that fails without the fix and adjust existing cases
that check C-TCAM spillage according to the above limitation.

[1]
mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available))

Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
Reported-by: Alexander Zubkov &lt;green@qrator.net&gt;
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Amit Cohen &lt;amcohen@nvidia.com&gt;
Tested-by: Alexander Zubkov &lt;green@qrator.net&gt;
Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>selftests: forwarding: Change inappropriate log_test_skip() calls</title>
<updated>2024-03-29T01:03:42Z</updated>
<author>
<name>Petr Machata</name>
<email>petrm@nvidia.com</email>
</author>
<published>2024-03-26T16:54:34Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=677f394956e808c709c18b92bd01d19f14a96dd5'/>
<id>urn:sha1:677f394956e808c709c18b92bd01d19f14a96dd5</id>
<content type='text'>
The SKIP return should be used for cases where tooling of the machine under
test is lacking. For cases where HW is lacking, the appropriate outcome is
XFAIL.

This is the case with ethtool_rmon and mlxsw_lib. For these, introduce a
new helper, log_test_xfail().

Do the same for router_mpath_nh_lib. Note that it will be fixed using a
more reusable way in a following patch.

For the two resource_scale selftests, the log should simply not be written,
because there is no problem.

Cc: Tobias Waldekranz &lt;tobias@waldekranz.com&gt;
Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Link: https://lore.kernel.org/r/3d668d8fb6fa0d9eeb47ce6d9e54114348c7c179.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests: net: Correct couple of spelling mistakes</title>
<updated>2024-03-05T02:33:47Z</updated>
<author>
<name>Prabhav Kumar Vaish</name>
<email>pvkumar5749404@gmail.com</email>
</author>
<published>2024-02-28T12:07:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fb0f02308126736c015afc0cac3c0e8712443299'/>
<id>urn:sha1:fb0f02308126736c015afc0cac3c0e8712443299</id>
<content type='text'>
Changes :
	- "excercise" is corrected to "exercise" in drivers/net/mlxsw/spectrum-2/tc_flower.sh
	- "mutliple" is corrected to "multiple" in drivers/net/netdevsim/ethtool-fec.sh

Signed-off-by: Prabhav Kumar Vaish &lt;pvkumar5749404@gmail.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Link: https://lore.kernel.org/r/20240228120701.422264-1-pvkumar5749404@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>mlxsw: spectrum_acl_tcam: Fix stack corruption</title>
<updated>2024-01-18T17:48:08Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2024-01-17T15:04:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=483ae90d8f976f8339cf81066312e1329f2d3706'/>
<id>urn:sha1:483ae90d8f976f8339cf81066312e1329f2d3706</id>
<content type='text'>
When tc filters are first added to a net device, the corresponding local
port gets bound to an ACL group in the device. The group contains a list
of ACLs. In turn, each ACL points to a different TCAM region where the
filters are stored. During forwarding, the ACLs are sequentially
evaluated until a match is found.

One reason to place filters in different regions is when they are added
with decreasing priorities and in an alternating order so that two
consecutive filters can never fit in the same region because of their
key usage.

In Spectrum-2 and newer ASICs the firmware started to report that the
maximum number of ACLs in a group is more than 16, but the layout of the
register that configures ACL groups (PAGT) was not updated to account
for that. It is therefore possible to hit stack corruption [1] in the
rare case where more than 16 ACLs in a group are required.

Fix by limiting the maximum ACL group size to the minimum between what
the firmware reports and the maximum ACLs that fit in the PAGT register.

Add a test case to make sure the machine does not crash when this
condition is hit.

[1]
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
[...]
 dump_stack_lvl+0x36/0x50
 panic+0x305/0x330
 __stack_chk_fail+0x15/0x20
 mlxsw_sp_acl_tcam_group_update+0x116/0x120
 mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
 mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
 mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
 mlxsw_sp_acl_rule_add+0x47/0x240
 mlxsw_sp_flower_replace+0x1a9/0x1d0
 tc_setup_cb_add+0xdc/0x1c0
 fl_hw_replace_filter+0x146/0x1f0
 fl_change+0xc17/0x1360
 tc_new_tfilter+0x472/0xb90
 rtnetlink_rcv_msg+0x313/0x3b0
 netlink_rcv_skb+0x58/0x100
 netlink_unicast+0x244/0x390
 netlink_sendmsg+0x1e4/0x440
 ____sys_sendmsg+0x164/0x260
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xc0
 do_syscall_64+0x40/0xe0
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Reported-by: Orel Hagag &lt;orelh@nvidia.com&gt;
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Amit Cohen &lt;amcohen@nvidia.com&gt;
Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Link: https://lore.kernel.org/r/2d91c89afba59c22587b444994ae419dbea8d876.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure</title>
<updated>2024-01-18T17:48:08Z</updated>
<author>
<name>Amit Cohen</name>
<email>amcohen@nvidia.com</email>
</author>
<published>2024-01-17T15:04:16Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6d6eeabcfaba2fcadf5443b575789ea606f9de83'/>
<id>urn:sha1:6d6eeabcfaba2fcadf5443b575789ea606f9de83</id>
<content type='text'>
Lately, a bug was found when many TC filters are added - at some point,
several bugs are printed to dmesg [1] and the switch is crashed with
segmentation fault.

The issue starts when gen_pool_free() fails because of unexpected
behavior - a try to free memory which is already freed, this leads to BUG()
call which crashes the switch and makes many other bugs.

Trying to track down the unexpected behavior led to a bug in eRP code. The
function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
index, sets the value and returns an error code. When gen_pool_alloc()
fails it returns address 0, we track it and return -ENOBUFS outside, BUT
the call for gen_pool_alloc() already override the index in erp_table
structure. This is a problem when such allocation is done as part of
table expansion. This is not a new table, which will not be used in case
of allocation failure. We try to expand eRP table and override the
current index (non-zero) with zero. Then, it leads to an unexpected
behavior when address 0 is freed twice. Note that address 0 is valid in
erp_table-&gt;base_index and indeed other tables use it.

gen_pool_alloc() fails in case that there is no space left in the
pre-allocated pool, in our case, the pool is limited to
ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
erp entries are required, we exceed the limit and return an error, this
error leads to "Failed to migrate vregion" print.

Fix this by changing erp_table-&gt;base_index only in case of a successful
allocation.

Add a test case for such a scenario. Without this fix it causes
segmentation fault:

$ TESTS="max_erp_entries_test" ./tc_flower.sh
./tc_flower.sh: line 988:  1560 Segmentation fault      tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &amp;&gt;/dev/null

[1]:
kernel BUG at lib/genalloc.c:508!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1
Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
RIP: 0010:gen_pool_free_owner+0xc9/0xe0
...
Call Trace:
 &lt;TASK&gt;
 __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum]
 objagg_obj_root_destroy+0x18/0x80 [objagg]
 objagg_obj_destroy+0x12c/0x130 [objagg]
 mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum]
 mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum]
 mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum]
 mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum]
 tc_setup_cb_destroy+0xc1/0x180
 fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
 __fl_delete+0x1ac/0x1c0 [cls_flower]
 fl_destroy+0xc2/0x150 [cls_flower]
 tcf_proto_destroy+0x1a/0xa0
...
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion

Fixes: f465261aa105 ("mlxsw: spectrum_acl: Implement common eRP core")
Signed-off-by: Amit Cohen &lt;amcohen@nvidia.com&gt;
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Link: https://lore.kernel.org/r/4cfca254dfc0e5d283974801a24371c7b6db5989.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests: mlxsw: Add scale test for port ranges</title>
<updated>2023-07-12T23:57:18Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2023-07-11T16:44:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=45c5a384765bb879f8a07e0056c0885771124271'/>
<id>urn:sha1:45c5a384765bb879f8a07e0056c0885771124271</id>
<content type='text'>
Query the maximum number of supported port range registers using
devlink-resource and test that this number can be reached by configuring
tc filters with different port ranges. Test that an error is returned in
case the maximum number is exceeded.

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Petr Machata &lt;petrm@nvidia.com&gt;
Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Link: https://lore.kernel.org/r/48eee181270d9f291e09d1858c7b26a3f7fcc164.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests: mlxsw: Move IPv6 decap_error test to shared directory</title>
<updated>2022-12-09T02:46:32Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2022-12-07T12:36:47Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=db401875f438168c5804b295b93a28c7730bb57a'/>
<id>urn:sha1:db401875f438168c5804b295b93a28c7730bb57a</id>
<content type='text'>
Now that Spectrum-1 gained ip6gre support we can move the test out of
the Spectrum-2 directory.

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Amit Cohen &lt;amcohen@nvidia.com&gt;
Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests: spectrum-2: tc_flower_scale: Dynamically set scale target</title>
<updated>2022-06-17T09:31:33Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2022-06-16T10:42:45Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ed62af45467a6786cbdeef42a7b4e7ced374f593'/>
<id>urn:sha1:ed62af45467a6786cbdeef42a7b4e7ced374f593</id>
<content type='text'>
Instead of hard coding the scale target in the test, dynamically set it
based on the maximum number of flow counters and their current
occupancy.

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Petr Machata &lt;petrm@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>selftests: mlxsw: Add a RIF counter scale test</title>
<updated>2022-06-17T09:31:33Z</updated>
<author>
<name>Petr Machata</name>
<email>petrm@nvidia.com</email>
</author>
<published>2022-06-16T10:42:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=be00853bfd2e704893916bc349e7ab1d50615cb4'/>
<id>urn:sha1:be00853bfd2e704893916bc349e7ab1d50615cb4</id>
<content type='text'>
This tests creates as many RIFs as possible, ideally more than there can be
RIF counters (though that is currently only possible on Spectrum-1). It
then tries to enable L3 HW stats on each of the RIFs. It also contains the
traffic test, which tries to run traffic through a log2 of those counters
and checks that the traffic is shown in the counter values.

Like with tc_flower traffic test, take a log2 subset of rules. The logic
behind picking log2 rules is that then every bit of the instantiated item's
number is exercised. This should catch issues whether they happen at the
high end, low end, or somewhere in between.

Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Reviewed-by: Amit Cohen &lt;amcohen@nvidia.com&gt;
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
