Skip to content

Conversation

@chithreshazad
Copy link
Contributor

Description / Motivation: Need to move the manually created nodeclasses and nodepools in perflab-titan-1 cluster to KIT for reuse in future runs.

Related Asana Task: https://app.asana.com/1/8442528107068/project/1209254984904634/task/1211563354393458?focus=true

Desktop Testing: Tested by creating a pipeline run https://experimental.scalability.eks.aws.dev/#/namespaces/scalability/pipelineruns/chithres-titan-ai-ml-pipeline-run-v27. Once this commit is merged I will also raise a PR for the ai-ml-load Pipeline. Currently the Pipeline has my KIT fork nodepools and nodeclasses URLs.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Description / Motivation: Need to move the manually created nodeclasses and nodepools in perflab-titan-1 cluster to KIT for reuse in future runs.

Related Asana Task: https://app.asana.com/1/8442528107068/project/1209254984904634/task/1211563354393458?focus=true

Desktop Testing: Tested by creating a pipeline run https://experimental.scalability.eks.aws.dev/#/namespaces/scalability/pipelineruns/chithres-titan-ai-ml-pipeline-run-v27. Once this commit is merged I will also raise a PR for the ai-ml-load Pipeline. Currently the Pipeline has my KIT fork nodepools and nodeclasses URLs.
Comment on lines +6 to +16
disruption:
budgets:
- nodes: 100%
reasons:
- Empty
- nodes: 10%
reasons:
- Drifted
- Underutilized
consolidateAfter: 0s
consolidationPolicy: WhenEmpty
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In perflab-titan-1 account the operator nodepool has 0% disruption budget. This essentially means it will never scale down. So changed this to allow replicas to scale down by matching it what some other nodepools have.

disruption:
    budgets:
    - nodes: 0%
    consolidateAfter: 0s
    consolidationPolicy: WhenEmpty

…rors

Description / Motivation: The kustomize-controller is failing with

```
kubectl logs kustomize-controller-8589b7fd57-62kv4 -n flux-system | grep error
...
{"level":"error","ts":"2025-10-28T00:20:58.625Z","logger":"controller.kustomization","msg":"Reconciliation failed after 482.930394ms, next try in 2m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"kubernetes-iteration-toolkit","namespace":"tekton-pipelines","revision":"main/5009ee34941fc5f77054986fc450c5be68cc8bb0","error":"Pipeline/scalability/derekff-karpenter-testing dry-run failed, reason: BadRequest, error: admission webhook \"validation.webhook.pipeline.tekton.dev\" denied the request: validation failed: non-existent variable in \"$(params.slack-hook)\": spec.finally[0].params[slack-hook]\n"}
```

This is likely blocking Flux to sync from KIT.
@chithreshazad chithreshazad deleted the karpenterAutomation branch October 28, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants