Skip to content

Conversation

@pahud
Copy link
Contributor

@pahud pahud commented Aug 13, 2025

Issue # (if applicable)

Closes #35220.

Reason for this change

Users need the ability to import existing SageMaker Pipelines into their CDK applications to integrate with other AWS services like EventBridge Scheduler. This addresses a common use case where data scientists develop and manage ML pipelines outside of CDK, but infrastructure teams need to schedule and orchestrate these pipelines as part of larger workflows.

The current @aws-cdk/aws-sagemaker-alpha package had an IPipeline interface but no concrete implementation with import methods, making it impossible to reference existing pipelines in CDK applications.

Description of changes

This change adds import-only Pipeline functionality to the @aws-cdk/aws-sagemaker-alpha package. Note: This construct cannot create new SageMaker Pipelines - it only imports existing ones for infrastructure integration purposes.

  • Pipeline Class: New Pipeline class implementing the IPipeline interface from aws-cdk-lib/aws-sagemaker with static import methods
  • Import Methods:
    • Pipeline.fromPipelineArn() - Import pipeline by ARN (supports cross-account/region)
    • Pipeline.fromPipelineName() - Import pipeline by name (same account/region as stack)
    • Pipeline.fromPipelineAttributes() - Import with custom attributes for advanced scenarios
  • Pipeline Name Validation: AWS-compliant validation for pipeline names (1-256 chars, alphanumeric with hyphens, no consecutive hyphens)
  • IAM Integration: grantStartPipelineExecution() method for proper permission management
  • EventBridge Scheduler Integration: Full compatibility with SageMakerStartPipelineExecution target by implementing the standard IPipeline interface
  • Cross-account Support: Custom account/region parameters for importing pipelines from different environments
  • Exported Utilities: The validatePipelineName() function is exported for reuse by other constructs and user validation

The implementation follows established CDK patterns used by other AWS services (S3, Lambda, IAM) for consistency and familiarity.

Key design decisions:

  • Import-only approach: This L2 construct cannot create new SageMaker Pipelines - it only imports existing ones. SageMaker Pipelines are complex ML workflows with intricate step definitions, dependencies, and configurations that are best developed using SageMaker Studio, the SageMaker Python SDK, or other ML-focused tools. The constructor throws an error if instantiated directly, guiding users to use the static import methods instead.
  • Interface compatibility: Uses IPipeline interface from the stable aws-cdk-lib/aws-sagemaker package instead of duplicating it in the alpha package. This ensures compatibility with existing CDK services (like EventBridge Scheduler targets) and prevents interface conflicts when both packages are used together.
  • Alpha package location: New SageMaker L2 constructs belong in the experimental alpha package
  • Comprehensive validation: Pipeline names are validated against AWS requirements to prevent runtime errors. The validatePipelineName() function is also exported as a utility for other constructs and user validation.
  • Flexible import options: Multiple import methods support different use cases and deployment patterns
  • Separation of concerns: Data scientists manage pipeline development and iteration using ML tools, while infrastructure teams handle scheduling, monitoring, and integration via CDK

Describe any new or updated permissions being added

IAM Permissions: The grantStartPipelineExecution() method grants the sagemaker:StartPipelineExecution permission on the specific pipeline resource. This is the minimal permission required for EventBridge Scheduler and other services to trigger pipeline executions.

No additional resource access patterns are introduced - this follows the standard CDK grant pattern used throughout the framework.

Description of how you validated changes

Unit Tests: 20 comprehensive unit tests covering all functionality:

  • Import methods (fromPipelineArn, fromPipelineName, fromPipelineAttributes)
  • Pipeline name validation (valid and invalid formats)
  • Cross-account/region support
  • IAM grant methods
  • EventBridge Scheduler integration
  • Error handling and edge cases

Integration Test: Real-world deployment test (integ.pipeline-import.ts) that:

  • Creates supporting AWS resources (S3 buckets, IAM roles)
  • Creates a minimal SageMaker Pipeline using L1 constructs
  • Tests both import methods (fromPipelineArn and fromPipelineName)
  • Validates IAM permission grants
  • Verifies EventBridge Scheduler compatibility

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

…ageMaker Pipelines

Add Pipeline class with import methods to enable importing existing SageMaker Pipelines
into CDK applications for use with EventBridge Scheduler and other AWS services.

- Add Pipeline.fromPipelineArn() for importing by ARN (cross-account/region support)
- Add Pipeline.fromPipelineName() for importing by name (same account/region)
- Add Pipeline.fromPipelineAttributes() for advanced import scenarios
- Add AWS-compliant pipeline name validation (1-256 chars, alphanumeric with hyphens)
- Add grantStartPipelineExecution() IAM method for proper permission management
- Add full EventBridge Scheduler integration with SageMakerStartPipelineExecution target
- Add comprehensive unit tests (17 tests) and integration test
- Add complete documentation with usage examples and EventBridge integration patterns

This enables separation of concerns where data scientists manage ML pipelines using
SageMaker tools while infrastructure teams can schedule and orchestrate them via CDK.

Closes aws#35220
@aws-cdk-automation aws-cdk-automation requested a review from a team August 13, 2025 17:29
@github-actions github-actions bot added effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2 labels Aug 13, 2025
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Aug 13, 2025
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

@pahud pahud changed the title feat(aws-sagemaker-alpha): add Pipeline import methods for existing SageMaker Pipelines feat(sagemaker-alpha): add Pipeline import methods for existing SageMaker Pipelines Aug 13, 2025
@aws-cdk-automation aws-cdk-automation dismissed their stale review August 13, 2025 17:46

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@@ -0,0 +1,189 @@
import { Construct } from 'constructs';
import { Grant, IGrantable } from 'aws-cdk-lib/aws-iam';
import { IPipeline } from 'aws-cdk-lib/aws-sagemaker';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should duplicate the IPipeline in the alpha module. Tentatively import it from stable module for now. Please advise if we should duplicate instead.

@pahud pahud marked this pull request as ready for review August 14, 2025 14:20
@lorenzwalthert
Copy link

@pahud what's stopping this PR to be reviewed and merged? I don't understand the processes here. If this isn't merged and released soon, I'll have to find a workaround.

@pahud
Copy link
Contributor Author

pahud commented Nov 6, 2025

@lorenzwalthert this PR has been in the reviewing backlog of the team. I'll bring this up to the team for inputs here. Thank you for the callout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution/core This is a PR that came from AWS. effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sagemaker: import existing SageMaker Pipeline into CDK

3 participants