Skip to content

Conversation

@oliviassss
Copy link
Contributor

Issue #, if available:

Description of changes:
Onboard networking components scale test

  • coredns
    • leverage the dnsperfgo in upstream perf-test/clusterloader, and collect dns request latency metrics.
    • currently use the default settings - the tester create 5 dns client pods and 1 extra DNS client pod for every 100 nodes in the cluster; we can fine tune the config later
  • kubeproxy
    • leverage the upstream clusterloader measurement to collect and KP perf metrics for networking programming latency
    • currently test with 5k endpoints, this can be configurable if we want to test larger scale later.

Test:

Tested via internal pipeline for 1k node. link; test results
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

action: start
{{if $ENABLE_NETWORK_POLICY_ENFORCEMENT_LATENCY_TEST}}
- module:
path: modules/network-policy/net-policy-enforcement-latency.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these modules ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you are copying this config in task here - https://github.com/awslabs/kubernetes-iteration-toolkit/pull/536/files#diff-fc65d141840f1a98569538f9a751871bc06280084dec75e8e1777f47f429814dR175 to cl2 load test dir.

Could you add the comment explicitly here saying you are copying this file to relative path under cl2 load test dir, because by default this won't work if your config file is sitting in some other directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment.
The test needs to access modules under the cl2 folder, so copying the config over cl2 is cleaner.

name: test-svc-deployment
namespace: test-svc
spec:
replicas: 5000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to make this configurable ?

Copy link
Contributor

@hakuna-matatah hakuna-matatah Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to block on this for this PR if you want, but you may want to keep this configurable to test at different scales.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would prefer to take this as a TODO. I'm thinking of 2 options:

  1. Migrate to use daemonset from deployment, enabling us to test endpoints across all nodes (where number of endpoints = node count)
  2. Implement a configurable setup with m services, each containing k endpoints, where both parameters are configurable
    I'd like to evaluate these 2 options further with an actual cluster that we will use for all the testing, and decide a better option (less time costly, but stress test kp)

EOF
cat $(workspaces.source.path)/perf-tests/clusterloader2/pkg/prometheus/manifests/exporters/kube-state-metrics/deployment.yaml

# # TODO: Remove this once we fix https://github.com/kubernetes/kubernetes/issues/126578 or find a better way to work around it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i get this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we still need this, looks like the issue with endpoint controller has been fixed upstream. Removing coredns service monitor will cause Prometheus stop scrapping coredns metrics. so I commented out


# create the service backed by 5k pods to test kubeproxy network programming performance
# we can tune the scale of pods later
kubectl apply -f $(workspaces.source.path)/perf-tests/clusterloader2/testing/load/test-svc.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you creating the workload even before test is kicked off ?

Copy link
Contributor Author

@oliviassss oliviassss Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to create the service with endpoints before the clusterloader binary runs since clusterloader only collects kp metrics.

The svc creation itself will trigger kp to sync the network programming rules, and generate the latency metrics (the metrics measures the time gap b/w endpoints creation timestamp and the time when kp done it's job, so it's better to create before cl collects the metrics).

@mengqiy mengqiy merged commit 5cf3f10 into awslabs:main Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants