- 
                Notifications
    You must be signed in to change notification settings 
- Fork 4.3k
          feat(stepfunctions-tasks): EmrCreateClusterOptions support ebsRootVolumeIops, ebsRootVolumeThroughput and managedScalingPolicy
          #34677
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| AWS CodeBuild CI Report
 Powered by github-codebuild-logs, available on the AWS Serverless Application Repository | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising the PR.
I have added inline comments where possible.
I have a high level question on the PR : It seems like the ebs root volume was already present as a property in the create emr cluster sfn task.
Is this change not impacting any other integration tests that might be testing emr cluster create task with any other ebs properties ?
        
          
                packages/aws-cdk-lib/aws-stepfunctions-tasks/lib/emr/emr-create-cluster.ts
          
            Show resolved
            Hide resolved
        
              
          
                packages/aws-cdk-lib/aws-stepfunctions-tasks/lib/emr/emr-create-cluster.ts
          
            Show resolved
            Hide resolved
        
              
          
                packages/aws-cdk-lib/aws-stepfunctions-tasks/lib/emr/emr-create-cluster.ts
          
            Show resolved
            Hide resolved
        
      | }); | ||
| start.next(describe); | ||
|  | ||
| describe.expect(ExpectedResult.objectLike({ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a check here that the EMR cluster is created ? And also have a check that the cluster configuration match the cluster configuration provided in the step function step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure let me add, might take some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verification added.
In case you have a question why we need 2 describeExecution calls, just found out that we cannot have below 2 things at the same time:
- Validate execution with .expect(ExpectedResult.objectLike({ status: 'SUCCEEDED', }))
- Retrieve clusterId with .getAttString('output.ClusterId')
Because when we do .getAttString('output.ClusterId'), the response return nothing besides from ClusterId, making status: 'SUCCEEDED' fail although the response actually have it. Thats why we need 2 describeExecution calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the timeout is. :
totalTimeout: Duration.minutes(5)
Is this enough for the whole emr cluster to be created ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is enough. It works everytime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this statement from the doc When you launch a cluster, either for the first time or when the AWSServiceRoleForEMRCleanup service-linked role is not present, Amazon EMR creates the AWSServiceRoleForEMRCleanup service-linked role for you. is not true, at least in this case when StepFunction trigger it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should add the permissions for creating service linked role, in an idempotent manner
Because it seems like the cluster is not created properly without the role, and requires the ability to create the service linked role for cluster creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are the required permissions for that, from the same link :
You must have permissions to create a service-linked role. For an example statement that adds this capability to the permissions policy of an IAM entity (such as a user, group, or role):
Add the following statement to the permissions policy for the IAM entity that needs to create the service-linked role.
{
"Sid": "ElasticMapReduceServiceLinkedRole",
"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "arn:aws:iam:::role/aws-service-role/elasticmapreduce.amazonaws.com/AWSServiceRoleForEMRCleanup*",
"Condition": {
"StringEquals": {
"iam:AWSServiceName": [
"elasticmapreduce.amazonaws.com",
"elasticmapreduce.amazonaws.com.cn"
]
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes may not be exactly this but would need some changes in the policy creation during the task creation : maybe we should change the policies added here
Since when using this stepfunction task the user might utilize the emr cluster created from this task for running jobs => The cluster creation should be successful once the task completes
(I have not used this feature personally, and it seems you have. So I would like to ask if adding these set of permissions would make this functionality easier to use? )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding these set of permissions would make this functionality easier to use?
You're great mate. Can confirm that if StepFunction/StateMachine has iam:CreateServiceLinkedRole then user doesn't need to precreate AWSServiceRoleForEMRCleanup. Better user experience of course.
Idempotent: YES. I executed the job 2 times and no issue when the role exists.
2c27e56    to
    5d92a2f      
    Compare
  
            
          
                packages/aws-cdk-lib/aws-stepfunctions-tasks/lib/emr/emr-create-cluster.ts
              
                Outdated
          
            Show resolved
            Hide resolved
        
      5d92a2f    to
    f3f171c      
    Compare
  
    | Sorry I forgot these questions: 
 Only EBSsize prop exist, not IOPS+throughput 
 Yes, new props will be undefined for current test/user. We know this since no change in other snapshots. | 
55f08c9    to
    2813ef0      
    Compare
  
    | Still need security approval process but really appreciate @kumvprat for your prompt support so far 🚀 You're very helpful. | 
        
          
                packages/aws-cdk-lib/aws-stepfunctions-tasks/lib/emr/emr-create-cluster.ts
              
                Outdated
          
            Show resolved
            Hide resolved
        
      …umeIops, ebsRootVolumeThroughput and managedScalingPolicy
2813ef0    to
    3964c6f      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!
| Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). | 
| Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). | 
| Comments on closed issues and PRs are hard for our team to see. | 
Issue # (if applicable)
Closes #33431
Reason for this change
Missing param in EmrCreateClusterOptions
Description of changes
validatePropsfunctionManagedScalingPolicy.ScalingStrategy is supported by awscli/API, but not yet supported by Step Function. Therefore not included in this PR

Describe any new or updated permissions being added
Add
iam:CreateServiceLinkedRolepermission for only EMR to createAWSServiceRoleForEMRCleanupas instructed by https://docs.aws.amazon.com/emr/latest/ManagementGuide/using-service-linked-roles-cleanup.html#create-service-linked-roleDescription of how you validated changes
Unit + Integ
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license