-
Notifications
You must be signed in to change notification settings - Fork 41
Implement per region instance type config for canary and e2e tests #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement per region instance type config for canary and e2e tests #87
Conversation
|
Hi @rrkharse. Thanks for your PR. I'm waiting for a aws-controllers-k8s member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one nit, otherwise looks good
test/e2e/replacement_values.py
Outdated
| "SAGEMAKER_EXECUTION_ROLE_ARN": get_bootstrap_resources().ExecutionRoleARN, | ||
| "MODEL_MONITOR_ANALYZER_IMAGE_URI": f"{MODEL_MONITOR_IMAGE_URIS[get_region()]}/sagemaker-model-monitor-analyzer", | ||
| "CLARIFY_IMAGE_URI": f"{CLARIFY_IMAGE_URIS[get_region()]}/sagemaker-clarify-processing:1.0", | ||
| "ENDPOINT_INSTANCE_TYPE": ENDPOINT_INSTANCE_TYPES[get_region()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since most regions, we can use same instance type, can we use
ENDPOINT_INSTANCE_TYPES.get(get_region(), ml.c5.large)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added changes below
…ANCE_TYPES and TRAINING_JOB_INSTANCE_TYPES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: akartsky, rrkharse, surajkota The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
Description of changes: - Remove dockerhub login step and use ECR public gallery ubuntu image: Individual user credentials need to be created and registered for all regions to pull from dockerhub. However, this step is redundant when we are already using credentials vended by ECR and can pull from the ECR public gallery. Making this change to facilitate region expansion by avoiding having to create new dockerhub users for each region - Update eksctl latest git release URL: The URL has been updated and the old URL no longer works - Implement per region instance type config for canary and e2e tests: Canary tests in the eu-north-1 region are failing since the currently specified instance type requires a limit increase to be used. Implementing a config to specify instance type per region that is within the default limits. Precedence for changes: - Remove dockerhub login step and use ECR public gallery ubuntu image: aws-controllers-k8s/sagemaker-controller@2ae5c33 - Update eksctl latest git release URL: aws-controllers-k8s/sagemaker-controller#87 - Implement per region instance type config for canary and e2e tests: aws-controllers-k8s/sagemaker-controller#87 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…ws-controllers-k8s#87) Description of changes: Currently, we are using one fixed instance type across all AWS regions in our endpoint and training job tests. However, certain regions do not support the currently specified instance type or require a limit increase to use that instance type. Specifically, canary tests in the eu-west-3 and eu-north-1 regions are failing due to this issue. This pull request updates the testing resource config file `replacement_values.py` to pass in the correct instance type depending on region. Regions that did not experience this issue will continue to use the previous instance type via the new config to avoid breaking canaries/e2e testing in those regions. The changes have been tested in our eu-west-3 and eu-north-1 canary stacks and have resulted in passing canaries. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Description of changes: - Remove dockerhub login step and use ECR public gallery ubuntu image: Individual user credentials need to be created and registered for all regions to pull from dockerhub. However, this step is redundant when we are already using credentials vended by ECR and can pull from the ECR public gallery. Making this change to facilitate region expansion by avoiding having to create new dockerhub users for each region - Update eksctl latest git release URL: The URL has been updated and the old URL no longer works - Implement per region instance type config for canary and e2e tests: Canary tests in the eu-north-1 region are failing since the currently specified instance type requires a limit increase to be used. Implementing a config to specify instance type per region that is within the default limits. Precedence for changes: - Remove dockerhub login step and use ECR public gallery ubuntu image: aws-controllers-k8s/sagemaker-controller@2ae5c33 - Update eksctl latest git release URL: aws-controllers-k8s/sagemaker-controller#87 - Implement per region instance type config for canary and e2e tests: aws-controllers-k8s/sagemaker-controller#87 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
* Implement per region instance type config for canary and e2e tests (#87) Description of changes: Currently, we are using one fixed instance type across all AWS regions in our endpoint and training job tests. However, certain regions do not support the currently specified instance type or require a limit increase to use that instance type. Specifically, canary tests in the eu-west-3 and eu-north-1 regions are failing due to this issue. This pull request updates the testing resource config file `replacement_values.py` to pass in the correct instance type depending on region. Regions that did not experience this issue will continue to use the previous instance type via the new config to avoid breaking canaries/e2e testing in those regions. The changes have been tested in our eu-west-3 and eu-north-1 canary stacks and have resulted in passing canaries. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. * enable adopted resource test in canaries * inc delete wait for endpoint Co-authored-by: Rahul Kharse <[email protected]>
Description of changes: - Remove dockerhub login step and use ECR public gallery ubuntu image: Individual user credentials need to be created and registered for all regions to pull from dockerhub. However, this step is redundant when we are already using credentials vended by ECR and can pull from the ECR public gallery. Making this change to facilitate region expansion by avoiding having to create new dockerhub users for each region - Update eksctl latest git release URL: The URL has been updated and the old URL no longer works - Implement per region instance type config for canary and e2e tests: Canary tests in the eu-north-1 region are failing since the currently specified instance type requires a limit increase to be used. Implementing a config to specify instance type per region that is within the default limits. Precedence for changes: - Remove dockerhub login step and use ECR public gallery ubuntu image: aws-controllers-k8s/sagemaker-controller@2ae5c33 - Update eksctl latest git release URL: aws-controllers-k8s/sagemaker-controller#87 - Implement per region instance type config for canary and e2e tests: aws-controllers-k8s/sagemaker-controller#87 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. Issue #, if available: Description of changes: By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Description of changes:
Currently, we are using one fixed instance type across all AWS regions in our endpoint and training job tests.
However, certain regions do not support the currently specified instance type or require a limit increase to use that instance type.
Specifically, canary tests in the eu-west-3 and eu-north-1 regions are failing due to this issue.
This pull request updates the testing resource config file
replacement_values.pyto pass in the correct instance type depending on region.Regions that did not experience this issue will continue to use the previous instance type via the new config to avoid breaking canaries/e2e testing in those regions.
The changes have been tested in our eu-west-3 and eu-north-1 canary stacks and have resulted in passing canaries.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.