-
Couldn't load subscription status.
- Fork 4.3k
fix(stepfunctions): incorrect/missing permissions to run/redrive DistributedMap in state machine #34760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(stepfunctions): incorrect/missing permissions to run/redrive DistributedMap in state machine #34760
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This review is outdated)
✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.
3720ec0 to
d65f21d
Compare
This comment was marked as outdated.
This comment was marked as outdated.
8b6afd4 to
6d8438c
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kasra-G Thanks for the contribution, this seems to be a niche one
I will add inline comments for this
Since it has been a while since the issue was reported, I am not sure if having the RedriveExecution policy on the DistributedMap state type is a default expectation(maybe it's called out in the documentation and I missed it, if so would be nice to link it in the PR description)
There are integration tests which cover the basic distributed map functionality : packages/@aws-cdk-testing/framework-integ/test/aws-stepfunctions/test/integ.distributed-map.ts which were working before(?)without explictly adding the RedriveExecution based policy. (trying to make a case for not needing the RedriveExecution policy by default)
| const distributedMapPolicy = new iam.Policy(stateMachine, 'DistributedMapPolicy'); | ||
| stateMachine.grantStartExecution(distributedMapPolicy); | ||
| stateMachine.grantExecution(distributedMapPolicy, 'states:DescribeExecution', 'states:StopExecution'); | ||
| stateMachine.grantRedriveExecution(distributedMapPolicy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the linked issue, it seems like the StartExecution, StopExecution and DescribeExecution permissions are not enough for the DistributedMap state to work and it needs the RedriveExecution permission also to function porperly. Is this summarization correct ? (comment link)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be that the faulty policy before was the reason as it had stateMachine arn instead of execution arn :
new iam.PolicyStatement({
actions: ['states:DescribeExecution', 'states:StopExecution'],
resources: [`${stateMachine.stateMachineArn}:*`],
}),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StartExecution, StopExecution and DescribeExecution permissions are not enough for the DistributedMap state to work and it needs the RedriveExecution permission also to function porperly
Correct, it also needs RedriveExecution, otherwise, when the state machine is redriven, it will fail to redrive the map run.
it could be that the faulty policy before was the reason as it had stateMachine arn instead of execution arn
This might have contributed to other users facing errors, but its not quite clear to me what the effect of the incorrect permissions are. The docs state you need the permissions on the execution and not the state machine, so that change was made. The generated permissions in the console is not helpful in this regard as the resources being granted to is just * for the DescribeExecution permissions.
...aws-cdk-testing/framework-integ/test/aws-stepfunctions/test/integ.distributed-map-redrive.ts
Outdated
Show resolved
Hide resolved
…g.map-with-catch.ts failing integ test
6d8438c to
9efce58
Compare
|
Thanks for taking a look at the PR & the review. I have updated it and have a few responses:
I have updated the PR description with link to AWS docs talking about the permissions needed when adding a DistributedMap to a state machine.
Yes, those integ tests work, but they do not test the redrive capability. That is why I have added a new integration test that tests the redrive capability for a state machine with a distributed map.
The redrive permissions are added when creating a distributed map in a state machine via the AWS Console (and selecting the option to create a new role), but not in CDK Just an additional note, I updated the new integ test Please let me know if there's any questions |
This comment was marked as resolved.
This comment was marked as resolved.
|
I also have changes that would fix the same underlying issue as #29913 before it was closed for inactivity. I have decided to add on these changes to this PR, let me know if it should be a separate PR. |
| }); | ||
| }), | ||
|
|
||
| test('Instantiate State Machine With Self Referencing Distributed Map State', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing that we are adding new tests : Are these possibilities allowed, like a self-referencing distributedmap state ?
It seems counter-intuitive that this use of the sfns will be helpful but if that is the case a few lines of comment above the test around when it's possible will be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wont be the judge of if it is useful to customers but the console does allow self referencing distributed maps. Its certainly possible when combined with a Choice state, for example, to keep looping a state until some condition is met and break out the loop. The test ensures CDK supports this for DistributedMap (as it should any other state)
I can add a comment that explains why we are testing it, (to make sure theres no infinite loop in the BFS), but I won't explain why customers would do it; that seems out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, we do want to point out why we are testing it. Customers would be free to use the functionality if it's available.
A comment on the test would help us understand in future why is this being tested out
3e3e150 to
a7544b9
Compare
|
@kumvprat I have removed the fix for DistributedMap policies not showing up in nested StateGraphs and will raise a new PR as requested for that once when this one gets merged. Let me know if we actually want to put those changes in this PR. I will change the linked issue to a new one that is not the main discussion thread, as I do not want that thread accidentally getting resolved when there are still pending changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kasra-G Are the changes to just the index.js for aws-lambda-python-alphs and aws-ses packages integ tests needed ? The index.js is changing, which is an asset but there are no corresponding changes related to the asset elsewhere(like maybe the asset hash that changes when being used in the template)
Probably not. I think those integ tests failed on my first version of the PR so I ran them to update the files but there wasn't really any changes. I'll try resetting those changes. I'm actually still getting CHANGED integ tests in the |
|
Apologies, just noticed I have committed snapshot files for an integ test that I renamed all the way back in my first or second pr revision. I will remove those snapshot files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for detailed work on the PR
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
|
Comments on closed issues and PRs are hard for our team to see. |
Issue # (if applicable)
Closes #35390.
Reason for this change
When using the
DistributedMapstate, the state machine cannot be redriven without additional configuration from the user. These additional steps seem to sometimes lead users into circular dependency issues (see example).Additionally, the
DistributedMapassigns incorrect permissions for thestates:StopExecutionandstates:DescribeExecutionas these need to be applied to the execution arn, not the state machine arn (AWS Docs vs code reference.This PR also fixes an inconsistency for the recently added
grantRedriveExecutionfunction as it does not work for state machines that have DistributedMaps.The root cause is that when adding a DistributedMap to a state machine, additional permissions need to be added to the state machine's execution role to permit redriving a map run. Otherwise, when the state machine is redriven, the map run will fail to redrive. The docs provide a minimum permissions example https://docs.aws.amazon.com/step-functions/latest/dg/iam-policies-eg-dist-map.html#iam-policy-redrive-dist-map
The key permission is
states:RedriveExecutionon the following arn:arn:aws:states:us-east-2:123456789012:execution:myStateMachineName/myMapRunLabel:*However, when creating a state machine manually in the AWS Console, if there are any unlabeled distributed maps, the following policy is generated for the generated state machine role:
arn:aws:states:us-east-2:123456789012:execution:myStateMachineName/*:*This change mirrors that logic in CDK.
Description of changes
Minor: fix
map-with-catchinteg test failing because it was not emptying the bucket before attempting to delete itbindmethod for DistributedMaps to include correct redrive permissionsstates:StopExecutionandstates:DescribeExecutionpermissionsFix missing DistributedMap policy statements when the DistributedMap state was in a childThis will be handled in a separate PR.StateGraphsuch as aParallel.branchThese changes fix the root cause of
grantRedriveExecutionnot working for state machines with DistributedMapsDescribe any new or updated permissions being added
when there is any unlabeled distributed map run state
when there are only labeled distributed map runs
Description of how you validated changes
Unit & Integration tests
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license