Skip to content

glue-alpha: feedback on spark jobs constructs introduced in "Refactored glue-alpha L2 CDK construct RFC 0497" #33356

@humanzz

Description

@humanzz

Describe the feature

Hello CDK team,

As a user of glue-alpha, and having contributed the initial job construct in #12506, I recently came across the refactor from #32521 as part of updating my CDK applications to 2.178.0.

Having refactored my code to leverage the new constructs - mostly the ScalaSparkEtlJob - I noticed a couple of things and wanted to provide feedback on them

Job Role requirements and its auto-creation

I think the previous behaviour of making role prop optional is a sensible default, and that behaviour should be restored, and code in subclasses corrected.

Enums / Constants

I think the previous approach, was better, and inline with other constructs e.g. Lambda, and that it should be restored to provide an easy way for adopting new values for GlueVersion and WorkerType without the need to use escape hatches or be blocked on CDK updates.

extraJars, extraFiles, extraPythonFiles, extraJarsFirst

  • the following points are for spark jobs of different languages
  • extraJars, extraJarsFirst and extraFiles are applicable to all spark jobs regardless of language (Scala/Python)
    • extraJars allow spark to load jvm-based libraries that can be used across both Scala and Python spark jobs
    • extraJarsFirst is about the order of jar loading for all spark jobs across both Scala and Python
  • extraFiles is a way to load other files e.g. binary files or text files in spark - again regardless of the spark job's language
  • extraPythonFiles is only relevant to Python spark jobs
  • the new constructs are not implementing the above behaviour
    • extraJars is not implemented for python spark jobs
    • extraJarsFirst is implemented in only ScalaSparkFlexEtlJob even though it should be available in all spark jobs wherever extraJars is available
    • extraFiles seem to be completely missing from Scala spark jobs

Therefore

Use Case

N/A

Proposed Solution

N/A

Other Information

N/A

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.178.0

Environment details (OS name and version, etc.)

macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-glueRelated to AWS Glueeffort/mediumMedium work item – several days of effortfeature-requestA feature should be added or improved.p2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions