[SPARK-49464] Add documentations #113

jiangzho · 2024-08-29T23:02:52Z

What changes were proposed in this pull request?

This PR includes Operator docs under docs/ for configuration, architecture, operations, and metrics.

Why are the changes needed?

Operator docs are necessary for users to understand the design and getting started with the operator installation

Does this PR introduce any user-facing change?

No - new release

How was this patch tested?

CIs

Was this patch authored or co-authored using generative AI tooling?

No

dongjoon-hyun

Could you make CI happy?

dongjoon-hyun · 2024-08-31T18:55:35Z

spark-operator-docs/build.gradle

+
+dependencies {
+  implementation project(":spark-operator")
+  implementation("org.projectlombok:lombok:$lombokVersion")


To @jiangzho , it seems that your repository is outdated a little.

Apache Spark Kubernetes Operator follows the Gradle Version Catalog. Please rebase your repository and refer the following commit.

[SPARK-49400] Use Gradle Version Catalog #109

dongjoon-hyun · 2024-09-04T17:46:08Z

Gentle ping, @jiangzho .

dongjoon-hyun · 2024-09-05T05:46:55Z

docs/operations.md

+
+- JDK17
+- Operator used fabric8 which assumes to be compatible with available k8s versions. However for using status subresource, please use k8s version 1.14 or above.
+- Spark versions 3.4 or above


Apache Spark 3.4.x reaches the end-of-life very soon (2024-10-13).

dongjoon-hyun · 2024-09-05T05:47:06Z

docs/operations.md

+
+### Compatibility
+
+- JDK17


Java 17 and 21.

dongjoon-hyun · 2024-09-05T05:49:39Z

docs/operations.md

+### Compatibility
+
+- JDK17
+- Operator used fabric8 which assumes to be compatible with available k8s versions. However for using status subresource, please use k8s version 1.14 or above.


As I pinged you already, the K8s eco-system is moving faster in the public environment.

[SPARK-49512][K8S][DOCS] Drop K8s v1.27 Support spark#47990

[SPARK-49516] Upgrade the minimum K8s version to v1.28 #117

Just FYI, in the community, please don't claim which you didn't test explicitly.

dongjoon-hyun · 2024-09-05T05:50:37Z

docs/operations.md

+- Operator used fabric8 which assumes to be compatible with available k8s versions. However for using status subresource, please use k8s version 1.14 or above.
+- Spark versions 3.4 or above
+
+## Manage Your Spark Operator


Remove this section because it's duplicated.

dongjoon-hyun · 2024-09-05T05:53:50Z

docs/operations.md

+| operatorRbac.role.create                                 | Whether to create Role for operator to use. At least one of `clusterRole.create` or `role.create` should be enabled                                      | true                                                                                                    |
+| operatorRbac.roleBinding.create                          | Whether to create RoleBinding for operator to use. At least one of `clusterRoleBinding.create` or `roleBinding.create` should be enabled                 | true                                                                                                    |
+| operatorRbac.clusterRole.configManagement.roleName       | Role name for operator configuration management (hot property loading and leader election)                                                               | `spark-operator-config-role`                                                                            |
+| appResources.namespaces.create                           | Whether to create dedicated namespaces for Spark apps.                                                                                                   | `spark-operator-config-role-binding`                                                                    |


Shall we add clusterResources first before adding this document? It looks a little weird because the document is missing one of the part while Apache Spark Operator supports both SparkApp CRD and SparkCluster CRD.

Yep! Add a short field in Spark Custom Resources page to start with. Also created SPARK-49528 to better doc the template support for clusters

Actually +1 for the point - appResources can be a bit misleading, since it may serve both SparkApp and SparkCluster. It was introduced to indicate this is for running Spark workload (comparing other resources created for operator deployment itself).

I shall fix this by SPARK-49623

dongjoon-hyun · 2024-09-05T05:59:07Z

settings.gradle

 include 'spark-operator-api'
 include 'spark-submission-worker'
 include 'spark-operator'
+include 'spark-operator-docs'


Shall we move this into build-tools directory? In addition, spark-operator-docs sounds like a little overclaim because this has only ConfOptionDocGenerator while the documentations has more contents.

Refactored to SPARK-49527

dongjoon-hyun

Please spin off code part.

https://github.com/apache/spark-kubernetes-operator/pull/113/files#r1744847803

dongjoon-hyun · 2024-09-12T23:56:05Z

Thank you for updating this.

Stale

jiangzho · 2024-09-16T18:26:07Z

build-tools/docs-utils/build.gradle

    commandLine "java", "-classpath", sourceSets.main.runtimeClasspath.getAsPath(), javaMainClass, docsPath
 }
+
+build.finalizedBy(generateConfPropsDoc)


This ensures the generated doc is updated per gradle build when new conf is introduced, if any

dongjoon-hyun · 2024-09-17T04:02:58Z

docs/architecture.md

+# Design & Architecture
+
+**Spark-Kubernetes-Operator** (Operator) acts as a control plane to manage the complete
+deployment lifecycle of Spark applications. The Operator can be installed on a Kubernetes


This was correct, but not as of now because we add SparkCluster CRD.

I guess we need to revise README.md, too.

Thanks a lot for the review and sorry for the late response!

I updated this to try a best-effort to cover SparkCluster as well.

dongjoon-hyun · 2024-09-17T04:04:25Z

docs/architecture.md

+namespace and controls Spark deployments in one or more managed namespaces. The custom resource
+definition (CRD) that describes the schema of a SparkApplication is a cluster wide resource.
+For a CRD, the declaration must be registered before any resources of that CRDs kind(s) can be
+used, and the registration process sometimes takes a few seconds.


Let's remove CRD details. It distracts Apache Spark K8s Operator explanation. We had better link to K8s CRD document.

For a CRD, the declaration must be registered before any resources of that CRDs kind(s) can be
used, and the registration process sometimes takes a few seconds.

dongjoon-hyun · 2024-09-17T04:05:32Z

docs/architecture.md

+For a CRD, the declaration must be registered before any resources of that CRDs kind(s) can be
+used, and the registration process sometimes takes a few seconds.
+
+Users can interact with the operator using the kubectl or k8s API. The Operator continuously


Let's not assume that the users are unaware of kubectl and helm. It's too common in these days, even in the ASF projects.

Users can interact with the operator using the kubectl or k8s API.

dongjoon-hyun · 2024-09-17T04:06:17Z

docs/architecture.md

+tracks cluster events relating to the SparkApplication custom resources. When the operator
+receives a new resource update, it will take action to adjust the Kubernetes cluster to the
+desired state as part of its reconciliation loop. The initial loop consists of the following
+high-level steps:


Please try to rewrite the above paragraph into a single sentence.

dongjoon-hyun · 2024-09-17T04:07:02Z

docs/architecture.md

+desired state as part of its reconciliation loop. The initial loop consists of the following
+high-level steps:
+
+* User submits a SparkApplication custom resource(CR) using kubectl / API


Let's be clear that this is one of two cases: SparkApplication and SparkCluster .

dongjoon-hyun · 2024-09-17T04:08:11Z

docs/architecture.md

+  desired state until the
+  current state becomes the desired state. All lifecycle management operations are realized
+  using this very simple
+  principle in the Operator.


Please remove All lifecycle management operations are realized using this very simple principle in the Operator.

dongjoon-hyun · 2024-09-17T04:12:46Z

docs/architecture.md

+
+## State Transition
+
+[<img src="resources/state.png">](resources/state.png)


png is not editable. When we need to update this in the future, how can we do?

Diagrams are created with draw.io which allows import and update png for simple diagrams. Would you suggest we add this in docs as well ? It's not user facing but may help future works

dongjoon-hyun · 2024-09-17T04:13:52Z

docs/architecture.md

+[<img src="resources/state.png">](resources/state.png)
+
+* Spark application are expected to run from submitted to succeeded before releasing resources
+* User may configure the app CR to time-out after given threshold of time


Is this in the diagram?

The application diagram tried to cover timeout blocks as well

dongjoon-hyun · 2024-09-17T04:14:27Z

docs/architecture.md

+* Spark application are expected to run from submitted to succeeded before releasing resources
+* User may configure the app CR to time-out after given threshold of time
+* In addition, user may configure the app CR to skip releasing resources after terminated. This is
+  typically used at dev phase: pods / configmaps. etc would be kept for debugging. They have


pods / configmaps. etc? Although the meaning is clear, could you revise this grammatically?

dongjoon-hyun · 2024-09-17T04:15:59Z

docs/configuration.md

+To enable hot properties loading, update the **helm chart values file** with
+
+```
+


Redundant empty line.

dongjoon-hyun · 2024-09-17T04:16:05Z

docs/configuration.md

+    # ... all other config overides...
+  dynamicConfig:
+    create: true
+


Redundant empty line.

dongjoon-hyun · 2024-09-17T04:17:05Z

docs/configuration.md

+
+## Config Metrics Publishing Behavior
+
+Spark Operator uses the same source & sink interface as Apache Spark. You may


source & sink -> source and sink

dongjoon-hyun · 2024-09-17T04:18:59Z

docs/metrics_logging.md

+under the License.
+-->
+
+# Metrics


Why do we have a new file for this? I'd recommend to include this content into configuration.md.

dongjoon-hyun · 2024-09-17T04:21:43Z

docs/operator_probes.md

+under the License.
+-->
+
+# Operator Probes


Please remove this section.

dongjoon-hyun · 2024-09-17T04:22:56Z

docs/operator_probes.md

+* operator runtimeInfo health state
+* Sentinel resources health state
+
+### Operator Sentinel Resource


Please move this section only into operations.md and remove operator_probles.md completely.

dongjoon-hyun · 2024-09-17T04:23:26Z

docs/spark_custom_resources.md

+  runtimeVersions:
+    scalaVersion: "2.13"
+    sparkVersion: "4.0.0-preview1"
+


Redundant empty link.

viirya · 2024-09-17T06:39:31Z

docs/configuration.md

+
+## Config Metrics Publishing Behavior
+
+Spark Operator uses the same source & sink interface as Apache Spark. You may


If Spark has corresponding doc, it is better to add a hyper link here.

viirya · 2024-09-17T06:40:48Z

docs/metrics_logging.md

+the [Dropwizard Metrics Library](https://metrics.dropwizard.io/4.2.25/). Note that Spark Operator
+does not have Spark UI, MetricsServlet
+and PrometheusServlet from org.apache.spark.metrics.sink package are not supported. If you are
+interested in Prometheus metrics exporting, please take a look at below section `Forward Metrics to Prometheus`


Add hyper link?

viirya · 2024-09-17T06:41:39Z

docs/metrics_logging.md

+
+## Forward Metrics to Prometheus
+
+In this section, we will show you how to forward spark operator metrics


Suggested change

In this section, we will show you how to forward spark operator metrics

In this section, we will show you how to forward Spark Operator metrics

viirya · 2024-09-17T06:43:05Z

docs/metrics_logging.md

+* Modify the
+  build-tools/helm/spark-kubernetes-operator/values.yaml file' s metrics properties section:


Suggested change

* Modify the

build-tools/helm/spark-kubernetes-operator/values.yaml file' s metrics properties section:

* Modify the metrics properties section in the file

`build-tools/helm/spark-kubernetes-operator/values.yaml`:

viirya · 2024-09-17T06:43:22Z

docs/metrics_logging.md

+sink.PrometheusPullModelSink
+```
+
+* Install the Spark Operator


Suggested change

* Install the Spark Operator

* Install Spark Operator

viirya · 2024-09-17T06:45:27Z

docs/spark_custom_resources.md

@@ -0,0 +1,203 @@
+## Spark Operator API


License header?

Thanks for the catch!

We have disabled license header check for markdowns, but I added this back for files under /docs for consistency

dongjoon-hyun

+1, LGTM. Let's merge this as the initial draft.

dongjoon-hyun · 2024-10-04T23:35:00Z

Thank you, @jiangzho and @viirya .

bump internal version to 0.4.0.1

github-actions bot added BUILD OPERATOR labels Aug 29, 2024

dongjoon-hyun requested changes Aug 31, 2024

View reviewed changes

jiangzho force-pushed the doc branch from 2b34613 to bd04cf9 Compare September 4, 2024 23:35

dongjoon-hyun changed the title ~~[SPARK-49464] Add docs for operator~~ [SPARK-49464] Add documentations Sep 5, 2024

dongjoon-hyun reviewed Sep 5, 2024

View reviewed changes

docs/operations.md Outdated

### Compatibility

- JDK17

Copy link

Member

dongjoon-hyun Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java 17 and 21.

dongjoon-hyun reviewed Sep 5, 2024

View reviewed changes

dongjoon-hyun previously requested changes Sep 5, 2024

View reviewed changes

jiangzho force-pushed the doc branch from 5be8ced to 6f461d2 Compare September 6, 2024 01:04

github-actions bot removed the OPERATOR label Sep 6, 2024

dongjoon-hyun mentioned this pull request Sep 6, 2024

[SPARK-49527] Add ConfOptionDocGenerator to generate Spark Operator Config Property Doc #118

Closed

jiangzho force-pushed the doc branch from 6f461d2 to b518cde Compare September 12, 2024 22:37

jiangzho marked this pull request as draft September 12, 2024 23:34

jiangzho force-pushed the doc branch from 0a1b5fb to 575536b Compare September 13, 2024 00:41

jiangzho marked this pull request as ready for review September 13, 2024 00:44

jiangzho commented Sep 16, 2024

View reviewed changes

dongjoon-hyun reviewed Sep 17, 2024

View reviewed changes

docs/configuration.md

# ... all other config overides...

dynamicConfig:

create: true

Copy link

Member

dongjoon-hyun Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant empty line.

dongjoon-hyun reviewed Sep 17, 2024

View reviewed changes

docs/operator_probes.md Outdated

under the License.

-->

# Operator Probes

Copy link

Member

dongjoon-hyun Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this section.

dongjoon-hyun reviewed Sep 17, 2024

View reviewed changes

viirya reviewed Sep 17, 2024

View reviewed changes

jiangzho added 7 commits September 27, 2024 14:29

[SPARK-49464] Add docs for operator

ccb447d

Adjust compatibilty list and add section for Spark Cluster Resource

cc13563

Resolve merge conflicts from docs generator

5878311

remove duplicate

67cf2a8

Update chart parameters with respect to workloadResources refactoring

bd7d80f

Add missing fields for chart param

190eb90

Refactor doc, add state diagram for cluster, fix typo, style and grammar

8cae922

jiangzho force-pushed the doc branch from 0864895 to 8cae922 Compare September 28, 2024 00:30

dongjoon-hyun approved these changes Oct 4, 2024

View reviewed changes

dongjoon-hyun closed this in de525e2 Oct 4, 2024

jiangzho added a commit to jiangzho/spark-kubernetes-operator that referenced this pull request Jul 17, 2025

Merge pull request apache#113 from zhou-jiang/vupdate

523d7ee

bump internal version to 0.4.0.1

dongjoon-hyun mentioned this pull request Oct 14, 2025

[MINOR] Update configuration doc with appropriate prometheus sink class #388

Closed


		## State Transition

		[<img src="resources/state.png">](resources/state.png)

		To enable hot properties loading, update the helm chart values file with

		```


		## Config Metrics Publishing Behavior

		Spark Operator uses the same source & sink interface as Apache Spark. You may


		## Forward Metrics to Prometheus

		In this section, we will show you how to forward spark operator metrics

		* Modify the
		build-tools/helm/spark-kubernetes-operator/values.yaml file' s metrics properties section:

[SPARK-49464] Add documentations #113

[SPARK-49464] Add documentations #113

Uh oh!

Conversation

jiangzho commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Sep 4, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Sep 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiangzho commented Aug 29, 2024 •

edited

Loading

dongjoon-hyun Sep 5, 2024 •

edited

Loading

dongjoon-hyun Sep 17, 2024 •

edited

Loading