-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Describe the feature
Hello CDK team,
As a user of glue-alpha, and having contributed the initial job construct in #12506, I recently came across the refactor from #32521 as part of updating my CDK applications to 2.178.0.
Having refactored my code to leverage the new constructs - mostly the ScalaSparkEtlJob - I noticed a couple of things and wanted to provide feedback on them
Job Role requirements and its auto-creation
- In the previous version, providing a
roleas part ofJobPropswas optional, and its absence led to the auto-creation of a default role - In the new base
JobProperties(sidenote: should it be calledJobPropsinstead ofJobProperties?), role is now required - Moreover, code in subclasses seem to be wrongly doing something I don't understand e.g.
I think the previous behaviour of making role prop optional is a sensible default, and that behaviour should be restored, and code in subclasses corrected.
Enums / Constants
- Several classes were changed to become enums e.g.
GlueVersionandWorkerTypewhich removed the ability to use newer values easily as was possible before e.g.GlueVersion.of(...). - Documentation on those new enums still reference no-longer existing methods of
GlueVersion.ofandWorkerType.ofe.g. https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-glue-alpha/lib/constants.ts#L5
I think the previous approach, was better, and inline with other constructs e.g. Lambda, and that it should be restored to provide an easy way for adopting new values for GlueVersion and WorkerType without the need to use escape hatches or be blocked on CDK updates.
extraJars, extraFiles, extraPythonFiles, extraJarsFirst
- the following points are for spark jobs of different languages
extraJars,extraJarsFirstandextraFilesare applicable to all spark jobs regardless of language (Scala/Python)extraJarsallow spark to load jvm-based libraries that can be used across bothScalaandPythonspark jobsextraJarsFirstis about the order of jar loading for all spark jobs across bothScalaandPython
extraFilesis a way to load other files e.g. binary files or text files in spark - again regardless of the spark job's languageextraPythonFilesis only relevant toPythonspark jobs- the new constructs are not implementing the above behaviour
extraJarsis not implemented for python spark jobsextraJarsFirstis implemented in onlyScalaSparkFlexEtlJobeven though it should be available in all spark jobs whereverextraJarsis availableextraFilesseem to be completely missing fromScalaspark jobs
Therefore
- all spark jobs should be updated to support
extraJars,extraJarsFirst,extraFilesprops (I found a related feat(glue-alpha): include extra jars parameter in pyspark jobs #33238)
Use Case
N/A
Proposed Solution
N/A
Other Information
N/A
Acknowledgements
- I may be able to implement this feature request
- This feature might incur a breaking change
CDK version used
2.178.0
Environment details (OS name and version, etc.)
macOS