Skip to content

Conversation

@rrkharse
Copy link
Contributor

Description of changes:

Currently, we are using one fixed instance type across all AWS regions in our endpoint and training job tests.
However, certain regions do not support the currently specified instance type or require a limit increase to use that instance type.
Specifically, canary tests in the eu-west-3 and eu-north-1 regions are failing due to this issue.

This pull request updates the testing resource config file replacement_values.py to pass in the correct instance type depending on region.
Regions that did not experience this issue will continue to use the previous instance type via the new config to avoid breaking canaries/e2e testing in those regions.

The changes have been tested in our eu-west-3 and eu-north-1 canary stacks and have resulted in passing canaries.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ack-bot ack-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2021
@ack-bot
Copy link
Collaborator

ack-bot commented Aug 30, 2021

Hi @rrkharse. Thanks for your PR.

I'm waiting for a aws-controllers-k8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ack-bot ack-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 30, 2021
@rrkharse rrkharse marked this pull request as ready for review August 30, 2021 16:34
@ack-bot ack-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2021
Copy link
Member

@surajkota surajkota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one nit, otherwise looks good

"SAGEMAKER_EXECUTION_ROLE_ARN": get_bootstrap_resources().ExecutionRoleARN,
"MODEL_MONITOR_ANALYZER_IMAGE_URI": f"{MODEL_MONITOR_IMAGE_URIS[get_region()]}/sagemaker-model-monitor-analyzer",
"CLARIFY_IMAGE_URI": f"{CLARIFY_IMAGE_URIS[get_region()]}/sagemaker-clarify-processing:1.0",
"ENDPOINT_INSTANCE_TYPE": ENDPOINT_INSTANCE_TYPES[get_region()],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since most regions, we can use same instance type, can we use
ENDPOINT_INSTANCE_TYPES.get(get_region(), ml.c5.large)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added changes below

@rrkharse rrkharse marked this pull request as draft August 30, 2021 19:56
@ack-bot ack-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2021
@rrkharse rrkharse marked this pull request as ready for review August 30, 2021 19:58
@ack-bot ack-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2021
Copy link
Member

@surajkota surajkota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@ack-bot ack-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 30, 2021
@ack-bot
Copy link
Collaborator

ack-bot commented Aug 30, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akartsky, rrkharse, surajkota

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@surajkota
Copy link
Member

/retest

@ack-bot ack-bot merged commit fb8d14e into aws-controllers-k8s:release-v0.0.3 Aug 30, 2021
ack-bot pushed a commit to aws-controllers-k8s/applicationautoscaling-controller that referenced this pull request Aug 31, 2021
Description of changes:
- Remove dockerhub login step and use ECR public gallery ubuntu image: Individual user credentials need to be created and registered for all regions to pull from dockerhub. However, this step is redundant when we are already using credentials vended by ECR and can pull from the ECR public gallery. Making this change to facilitate region expansion by avoiding having to create new dockerhub users for each region
- Update eksctl latest git release URL: The URL has been updated and the old URL no longer works
- Implement per region instance type config for canary and e2e tests: Canary tests in the eu-north-1 region are failing since the currently specified instance type requires a limit increase to be used. Implementing a config to specify instance type per region that is within the default limits.

Precedence for changes:

- Remove dockerhub login step and use ECR public gallery ubuntu image: aws-controllers-k8s/sagemaker-controller@2ae5c33
- Update eksctl latest git release URL: aws-controllers-k8s/sagemaker-controller#87
- Implement per region instance type config for canary and e2e tests: aws-controllers-k8s/sagemaker-controller#87


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
surajkota pushed a commit to surajkota/ack-sagemaker-controller that referenced this pull request Sep 1, 2021
…ws-controllers-k8s#87)

Description of changes:

Currently, we are using one fixed instance type across all AWS regions in our endpoint and training job tests. 
However, certain regions do not support the currently specified instance type or require a limit increase to use that instance type.
Specifically, canary tests in the eu-west-3 and eu-north-1 regions are failing due to this issue. 

This pull request updates the testing resource config file `replacement_values.py` to pass in the correct instance type depending on region. 
Regions that did not experience this issue will continue to use the previous instance type via the new config to avoid breaking canaries/e2e testing in those regions.

The changes have been tested in our eu-west-3 and eu-north-1 canary stacks and have resulted in passing canaries.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
mbaijal pushed a commit to mbaijal/applicationautoscaling-controller that referenced this pull request Sep 1, 2021
Description of changes:
- Remove dockerhub login step and use ECR public gallery ubuntu image: Individual user credentials need to be created and registered for all regions to pull from dockerhub. However, this step is redundant when we are already using credentials vended by ECR and can pull from the ECR public gallery. Making this change to facilitate region expansion by avoiding having to create new dockerhub users for each region
- Update eksctl latest git release URL: The URL has been updated and the old URL no longer works
- Implement per region instance type config for canary and e2e tests: Canary tests in the eu-north-1 region are failing since the currently specified instance type requires a limit increase to be used. Implementing a config to specify instance type per region that is within the default limits.

Precedence for changes:

- Remove dockerhub login step and use ECR public gallery ubuntu image: aws-controllers-k8s/sagemaker-controller@2ae5c33
- Update eksctl latest git release URL: aws-controllers-k8s/sagemaker-controller#87
- Implement per region instance type config for canary and e2e tests: aws-controllers-k8s/sagemaker-controller#87


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
surajkota added a commit that referenced this pull request Sep 1, 2021
* Implement per region instance type config for canary and e2e tests (#87)

Description of changes:

Currently, we are using one fixed instance type across all AWS regions in our endpoint and training job tests. 
However, certain regions do not support the currently specified instance type or require a limit increase to use that instance type.
Specifically, canary tests in the eu-west-3 and eu-north-1 regions are failing due to this issue. 

This pull request updates the testing resource config file `replacement_values.py` to pass in the correct instance type depending on region. 
Regions that did not experience this issue will continue to use the previous instance type via the new config to avoid breaking canaries/e2e testing in those regions.

The changes have been tested in our eu-west-3 and eu-north-1 canary stacks and have resulted in passing canaries.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

* enable adopted resource test in canaries

* inc delete wait for endpoint

Co-authored-by: Rahul Kharse <[email protected]>
ack-bot pushed a commit to aws-controllers-k8s/applicationautoscaling-controller that referenced this pull request Sep 1, 2021
Description of changes:
- Remove dockerhub login step and use ECR public gallery ubuntu image: Individual user credentials need to be created and registered for all regions to pull from dockerhub. However, this step is redundant when we are already using credentials vended by ECR and can pull from the ECR public gallery. Making this change to facilitate region expansion by avoiding having to create new dockerhub users for each region
- Update eksctl latest git release URL: The URL has been updated and the old URL no longer works
- Implement per region instance type config for canary and e2e tests: Canary tests in the eu-north-1 region are failing since the currently specified instance type requires a limit increase to be used. Implementing a config to specify instance type per region that is within the default limits.

Precedence for changes:

- Remove dockerhub login step and use ECR public gallery ubuntu image: aws-controllers-k8s/sagemaker-controller@2ae5c33
- Update eksctl latest git release URL: aws-controllers-k8s/sagemaker-controller#87
- Implement per region instance type config for canary and e2e tests: aws-controllers-k8s/sagemaker-controller#87


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm Indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants