Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSIT-1948] NICs do not consistently distribute tunnels over RXQs depending on model or plugin #4030

Open
vvalderrv opened this issue Feb 4, 2025 · 6 comments

Comments

@vvalderrv
Copy link
Contributor

Description

For most ipsec tests on 3n-icx, AVF puts all tunnels into a single RXQ, but dpdk plugin distributes them fairly.

Telemetry showing 40tnl test on AVF [0], compared to Cx6dx with mlx5 driver (dpdk plugin) [1].

Current trending [2] shows this issue.

(And also two-band structure for 4c AVF tests caused by which testbed got reserved, which is probably not related to this RSS issue).

[0] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/433/log.html.gz#s1-s1-s1-s1-s20-t1-k2-k13-k9-k14-k1-k1-k1-k1

[1] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/433/log.html.gz#s1-s1-s1-s1-s8-t1-k2-k13-k9-k14-k1-k1-k1-k1

[2] https://csit.fd.io/trending/#eNrtlMGKwjAQhp-me5GBpjbWiwe177HE6aiBpo1JVluffmN3YSqLsOjCHvSSQP7J_H8-hvjQOnr3VC8SuUqKVZIVuopLMl1O4na0FqYNaOxApOmOMiuwm1UdmLqTgK63oQUhxXwDAoHCXttcW08Yq9PQ1P4E8WSjPIFuAijymZzt0IBx7uKSrS8u1Ue4smTF7ntWbgbheuVI8YWvfKwG8iOj36flDlunDHl9Jm4zvJ4rMPIciXjtHno7Ur8hFOVQ8Qj8_AX_J_z8j-HTXKR4AHXc_vPkc5Dnmfxb8PMX_PsnX5ZvTevM8PfL8hMROrZ7

Assignee

Unassigned

Reporter

Vratko Polak

Comments

  • vrpolak (Wed, 20 Nov 2024 12:57:41 +0000): Still present around rls2410, here [9] is a simpler trending link.

[9] https://csit.fd.io/trending/#eNrVUstuwjAQ_Jr0Uq0UB9xw6aGQ_0COsxCreSy7BhS-vo5byeFQCW7txfZ6dnbGI4sfGfeC3Xumt1m5zYrSNWHJVh-vYbsQwWqoQYhB5fkRC1K4Ubk9gbkcwPJEfgSl1aYGZQF962jtSNCu_dDJFUJZG0FwgweDUui3o-2hZ541it2s0Zz9nWBCqJ0S8ruNRDCMJjG-3SXUoyyUHvSa6Ac2PYq7YZoRH546bMhyAdp7aT_RAv2JoKxixzPBN9R8_oXkZx__PHpdvQwj9_Hvh7O04xU8O9NJvArkWMzduvoC9nnofw

  • vrpolak (Mon, 12 Aug 2024 13:43:25 +0000): Still present on rls2406.
  • vrpolak (Fri, 1 Mar 2024 12:41:39 +0000):

    The only other combination I found (in rls2310 iterative results) is this that

3n-tsh Intel-X520 dpdk plugin SRv6 is bad: [8].

[8] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2310-3n-tsh/22/log.html.gz#s1-s1-s1-s6-s2-t3-k2-k9-k22-k14-k1-k1-k1-k1

  • vrpolak (Thu, 29 Feb 2024 17:31:11 +0000):

    The pattern seems to be opposite for SRv6 tests.

Here, mlx5 suites show bad worker distribution [6], but avf suites are good [7].

Not sure yet if it is possible to hardcode a default RSS hash function in VPP that would work for all encap/decap tests at once.

[6] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/433/log.html.gz#s1-s1-s1-s6-s2-t3-k2-k9-k9-k14-k1-k1-k1-k1

[7] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/433/log.html.gz#s1-s1-s1-s6-s8-t3-k2-k9-k9-k14-k1-k1-k1-k1

  • vrpolak (Tue, 27 Feb 2024 12:28:24 +0000): Currently I believe it is better to control RSS behavior from VPP side using existing APIs (instead of relying on preparation before VPP starts). We already have suites that do that (see lines 121-126 in [5]), we just need to make sure those steps work (or at least do not fail) on all tested nic+driver combinations.

[5] https://gerrit.fd.io/r/c/csit/+/36119/8/tests/vpp/perf/crypto/10ge2p1x710-ethip4ipsec1000tnlsw-fixtnlip-ip4base-policy-flow-rss-aes256gcm-ndrpdr.robot

  • vrpolak (Mon, 26 Feb 2024 17:40:30 +0000): > two-band structure for 4c AVF tests caused by which testbed got reserved

That seems to be an unrelated issue. The telemetry from worse run [3] shows vpp_wk_0 (on either DUT) did not get 256 packets when reading from TG side (did get from DUT side), but better run [4] did (and also other workers did read from TX side in larger chunks).

Still, both issues may be related to some NIC configuration, so may get fixed together.

[3] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/433/log.html.gz#s1-s1-s1-s1-s20-t3-k2-k13-k9-k14-k1-k1-k1-k1

[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/432/log.html.gz#s1-s1-s1-s1-s20-t3-k2-k13-k9-k14-k1-k1-k1-k1

Original issue: https://jira.fd.io/browse/CSIT-1948

@vvalderrv
Copy link
Contributor Author

Still present on rls2406.

@vvalderrv
Copy link
Contributor Author

The only other combination I found (in rls2310 iterative results) is this that
3n-tsh Intel-X520 dpdk plugin SRv6 is bad: [8].

[8] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2310-3n-tsh/22/log.html.gz#s1-s1-s1-s6-s2-t3-k2-k9-k22-k14-k1-k1-k1-k1

@vvalderrv
Copy link
Contributor Author

The pattern seems to be opposite for SRv6 tests.
Here, mlx5 suites show bad worker distribution [6], but avf suites are good [7].
Not sure yet if it is possible to hardcode a default RSS hash function in VPP that would work for all encap/decap tests at once.

[6] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/433/log.html.gz#s1-s1-s1-s6-s2-t3-k2-k9-k9-k14-k1-k1-k1-k1
[7] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/433/log.html.gz#s1-s1-s1-s6-s8-t3-k2-k9-k9-k14-k1-k1-k1-k1

@vvalderrv
Copy link
Contributor Author

Currently I believe it is better to control RSS behavior from VPP side using existing APIs (instead of relying on preparation before VPP starts). We already have suites that do that (see lines 121-126 in [5]), we just need to make sure those steps work (or at least do not fail) on all tested nic+driver combinations.

[5] https://gerrit.fd.io/r/c/csit/+/36119/8/tests/vpp/perf/crypto/10ge2p1x710-ethip4ipsec1000tnlsw-fixtnlip-ip4base-policy-flow-rss-aes256gcm-ndrpdr.robot

@vvalderrv
Copy link
Contributor Author

> two-band structure for 4c AVF tests caused by which testbed got reserved

That seems to be an unrelated issue. The telemetry from worse run [3] shows vpp_wk_0 (on either DUT) did not get 256 packets when reading from TG side (did get from DUT side), but better run [4] did (and also other workers did read from TX side in larger chunks).

Still, both issues may be related to some NIC configuration, so may get fixed together.

[3] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/433/log.html.gz#s1-s1-s1-s1-s20-t3-k2-k13-k9-k14-k1-k1-k1-k1
[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/432/log.html.gz#s1-s1-s1-s1-s20-t3-k2-k13-k9-k14-k1-k1-k1-k1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant