Skip to content
Closed
Changes from 44 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
64bbe8c
Add RFC for basic schema requirements in cluster apps
marians Nov 25, 2022
c2bba6a
Add TOC
marians Dec 1, 2022
7284043
Maintain uppercase MUST
marians Dec 8, 2022
6c9dc6a
Cahnge "below" to "above"
marians Dec 8, 2022
679a970
Change "brakes" to "breaks"
marians Dec 8, 2022
3a1eedb
Mention enum to list valid values
marians Dec 8, 2022
636a28a
Extended R8 on required properties
marians Dec 8, 2022
736151f
Replace payload with instance
marians Dec 8, 2022
9c6e189
Update TODO list
marians Dec 8, 2022
f4e634a
Add requirement regarding anyOf/oneOf
marians Dec 12, 2022
afe2672
Add link to R9
marians Jan 10, 2023
296d9cd
Add R10 on how to use deprecated=true
marians Jan 10, 2023
25e0fa1
Clarify description requirements
marians Jan 10, 2023
314d99e
Remove line on patternProperties
marians Jan 10, 2023
57af1a0
Fix exclusiveMaximum -> exclusiveMinimum
marians Jan 10, 2023
c0b6eff
Require draft 2020-12
marians Jan 10, 2023
02fe1e1
Add TODO
marians Jan 11, 2023
3073423
Update and improve requirements regarding title (R4)
marians Jan 16, 2023
ac661ee
Update description requirements (R5)
marians Jan 16, 2023
9ce8eed
Fix: example -> examples
marians Jan 16, 2023
bdd7107
Add R11 on providing distinct string values
marians Jan 17, 2023
6dc44d8
Update requirement regarding type
marians Jan 19, 2023
70c15b4
Add details to title requirements
marians Jan 19, 2023
e906823
Add details to description requirements
marians Jan 19, 2023
1bb5ae9
Require draft 2019-09 instead of 202-12
marians Jan 30, 2023
a075697
Add TODO items
marians Feb 3, 2023
0d6ba39
Clarify examples requirements
marians Feb 3, 2023
83f0c87
Clarify requirements for string and numeric field restrictions
marians Feb 3, 2023
9630507
Add type requirements
marians Feb 5, 2023
4720417
Add requirement to void recursion
marians Feb 5, 2023
2b7e16e
Add requirement to avoid if, then, else
marians Feb 5, 2023
9363c0b
Add requirement to avoid unevaluated properties/items
marians Feb 5, 2023
9d9a8a5
Add requirement to avoid tuple validation
marians Feb 5, 2023
d424fac
Add requirement regarding 'contains'
marians Feb 5, 2023
35031cb
Fix casing
marians Feb 6, 2023
4c0f5ae
Refine R9 (allOf, anyOf)
marians Feb 7, 2023
7163aab
Boolean properties don't need examples
marians Feb 13, 2023
0a3d632
Change draft to 2020-12
marians Feb 13, 2023
a55d389
Update section numbering
marians Feb 13, 2023
c5fdaff
Add R17 on common schema structure
marians Feb 13, 2023
d2a2f35
Rephrase and refine R3
marians Mar 1, 2023
211488e
Extend R17 for compatibility reasons
marians Mar 1, 2023
172a47b
R17: Add /cluster-shared to allowed properties
marians Mar 1, 2023
4b54af0
Fix table headings
marians Mar 2, 2023
10637bc
Update R7 on examples
marians Mar 9, 2023
521379e
Add missing condition to R7
marians Mar 9, 2023
650c139
Fix typos
marians Mar 9, 2023
d345df7
Add R18 for empty defaults
mogottsch Apr 11, 2023
3ee3e0b
Fix r18 heading
mogottsch Apr 11, 2023
6a26e98
Adapt common schema structure requirements to cluster-aws
mogottsch Apr 12, 2023
29e256a
Add recommendations for default values (merge vs. replace)
AndiDog Apr 17, 2023
0f8706d
Add recommendations for node pools
AndiDog Apr 18, 2023
6545e82
Link to gitops-template example of using different environments and m…
AndiDog Apr 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
384 changes: 384 additions & 0 deletions cluster-app-values-schema/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,384 @@
# Basic requirements for cluster app values schema

Cluster apps are apps which are used to configure and provision clusters based on Cluster API in the Giant Swarm platform. Like all helm charts, these apps are configured using so-called values YAML files, which are passed to the app as a ConfigMap for cluster creation. The allowed structure of the values YAML is defined by the app's values schema.

This RFC defines basic requirements for all cluster apps provided by Giant Swarm.

## Overview

- [R1: JSON Schema dialect (draft 2020-12) must be specified](#r1)
- [R2: A single type must be declared](#r2)
- [R3: Explicitly cover all allowed properties](#r3)
- [R4: Array item schema must be defined](#r4)
- [R5: Properties must have a title](#r5)
- [R6: Properties should have descriptions](#r6)
- [R7: Properties should provide examples](#r7)
- [R8: Constrain values as much as possible](#r8)
- [R9: Required properties must be marked as such](#r9)
- [R10: Use `anyOf` and `oneOf` only for specific purposes](#r10)
- [R11: Use `deprecated` to phase out properties](#r11)
- [R12: Provide valid string values where possible](#r12)
- [R13: Avoid recursion](#r13)
- [R14: Avoid logical constructs using `if`, `then`, `else`](#r14)
- [R15: Avoid `unevaluatedProperties`, `unevaluatedItems`](#r15)
- [R16: Array items and tuple validation](#r16)
- [R17: Common schema structure](#r17)

## Background

TODO: Some more info regarding the rationale.

- In-place documentation of properties
- Better validation to catch configuration errors before they are applied
- Generated entry forms
- Generated information display
- Enabling processes and communication about explicit contracts instead of implicit behaviour

## Definitions

- **Cluster app**: App (as defined by the Giant Swarm app platform) that allows for provisioning a cluster in a Giant Swarm installation.
- **JSON Schema**: declarative language that allows to annotate and validate documents that are compatible with JSON.
- The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.html)

## Requirements

Requirements carry identifiers with prefix `R` and a unique number, for easy referencing.

### R1: JSON Schema dialect (draft 2020-12) must be specified {#r1}

Each cluster app's values schema file MUST specify the schema dialect (also called draft) used via the `$schema` keyword on the top level, with the draft URI as a value.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning and possible values of specific keywords changed over the course of different drafts. Supporting all drafts would increase the complexity of our tooling, e.g. schemalint. I would suggests that we initially only support the newest draft (2020-12) and add support for newer drafts as they are released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest restrictions we discussed (e.g. forbid prefixItems), does it still make sense to require 2019-09? Or can we go back to 2020-12?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we make sure that we are indifferent to all the changes between 2019-09 and 2020-12.
That would be:

  • forbid prefixItems (2020)
  • forbid additionalItems (2019)
  • forbid items with tuple values (2019)
  • forbid $dynamicRef and $dynamicAnchor (2020)
  • forbid $recursiveRef and $recursiveAnchor (2019)
    As far as I know, we already wanted to forbid all of these. So we could go back to 2020-12.


The draft URI MUST be `https://json-schema.org/draft/2020-12/schema`.

Example:

```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
...
}
```

### R2: A single type must be declared {#r2}

For each property, including the root level schema, the `type` keyword MUST be present, and there MUST be a single value for the `type` keyword.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need type, when $ref is specified. For example:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "properties": {
    "address": {
      "$ref": "address.json"
    },
  "type": "object"
}

Here I would expect that we don't require a type for address.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, true. Do you think this should be mentioned? To me that's covered by the definition of $ref in JSON Schema.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important for the implementation of this rule in schemalint. JSON Schema does not forbid using type next to $ref. However, I think leaving out type when $ref is present is so intuitive, that we don't necessarily need to mention it here.


While JSON Schema draft 2020-12 allows multiple values for the `type` keyword, for cluster-apps this would complicate the generation of user interfaces. Hence we require a single type to be specified.

For validation, there is no difference between an array with a single member or a single string. The two following examples can be considered identical.

```json
{
"type": "string",
"title": "Name"
}
```

```json
{
"type": ["string"],
"title": "Name"
}
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add negative examples, or particularly recommendations regarding null?

### R3: Explicitly cover all allowed properties {#r3}

All properties that can be used in values MUST be covered in the schema explicitly.

To enforce this and to disable the use of any undefined properties, the keyword `additionalProperties` MUST be set to `false` on the schema's root object.

For objects deeper in the schema hierarchy, keyword `additionalProperties` SHOULD be set to `false`, too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we need to adapt. additionalProperties can also be set to a schema.
It should be always set and never to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we use properties vs. additionalProperties. If set to an object, they both seem to do the same?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that was a beginner question I guess 😆 – additionalProperties is for when the object keys can be dynamic, such as in nodePools: { abc: ..., def: ... }.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. additionalProperties can be used if the keys can be arbitrary and there is also patternProperties, which uses a regex pattern to restrict the keys. This one is also used in our schemas, AFAIK.


### R4: Array item schema must be defined {#r4}

The items schema for all array properties MUST be defined using the `items` keyword.

### R5: Properties must have a title {#r5}

Each property MUST be annotated via the `title` keyword.

The `title` keyword provides a user friendly label for the property, to be displayed in a user interface instead of the technical property name. The title should be chosen to match the wording used in documentation, especially in prosa text (as opposed to configuration examples).

Example:

```json
{
...
"properties": {
"name": {
"type": "string",
"title": "Cluster name",
...
}
}
}
```

Additional requirements apply to the title value:

- The title MUST be written using sentence case capitalization. Correct: "Cluster name", incorrect: "Cluster Name".
- The title MUST NOT contain punctuation marks, leading or trailing white space, control characters, tabs, nor multiple whitespaces in a row.
- If the title annotates a property that is part of another object, the title SHOULD NOT include the parent property name, to avoid repetition.
- Example: with an object `/controlPlane` that is titled `Control plane`, the property `/controlPlane/availabilityZones` should be titled `Availability zones`, not `Control plane availability zones`.

### R6: Properties should have descriptions {#r6}

Each property SHOULD be annotated and via the `description` keyword.

Example:

```json
{
...
"properties": {
"name": {
"type": "string",
"description": "Identifies this cluster uniquely within the installation.",
...
}
}
}
```

If a description is given, additional requirements apply to the value:

- Description content MUST NOT include any line breaks.
- Descriptions MUST NOT contain formatting code like e.g. Markdown or HTML, leading or trailing white space, control characters, tabs, nor multiple whitespaces in a row.
- Descriptions MUST use sentence case capitalization and punctuation.
- The description SHOULD NOT repeat the property name or title. Correct: "Identifies this cluster uniquely within the installation.", incorrect: "Cluster name identifies this cluster uniquely within the installation.".
- Descriptions SHOULD be between 50 and 200 characters long.
- Descriptions SHOULD be written in simple language.

### R7: Properties should provide examples {#r7}

Each property (except for type `boolean`) SHOULD provide one or more examples using the `examples` keyword.

Example:

```json
{
...
"properties": {
"name": {
"type": "string",
"examples": ["devel001"],
...
}
}
}
```

An example can give users easy-to-understand guideance on how to use a property. Ideally, examples SHOULD be valid cases, fulfilling the contstraints of the property.

Multiple examples can be provided. Per property, there SHOULD be at least one example. Additional examples should only be provided to indicate the range of different values possible. There SHOULD NOT be more than five examples per property.

TODO: We could decide to use the examples for testing purposes, replacing ci-values.yaml and the likes. In that case, we should make it a MUST requirement.

### R8: Constrain values as much as possible {#r8}

Property schema SHOULD explicitly restrict values as much as possible.

There are several ways this requirement can/has to be fulfilled, depending on the property type and the use case.

String properties SHOULD be constrained by at least one of the keywords `const`, `enum`, `pattern`, `minLength`, or `maxLength`, or `format`.

Numeric properties (type `number`, `integer`) SHOULD be constrained by at least one of the keywords `minimum` or `exclusiveMinimum`, `maximum` or `exclusiveMaximum`.

Example:

```json
"diskSizeInGb": {
"type": "integer",
"minimum": 10,
"maximum": 200,
"multipleOf": 10
}
```

### R9: Required properties {#r9}

If a property is required to be set by the creator of a cluster, it MUST be included in the `required` keyword.

Properties that have default values defined in the app's `values.yaml` file MUST NOT be set as required, as here the default values are applied in case no user value is given.

Properties that get default values assigned via some external mechanism (e.g. an admission controller) also MUST NOT be set as required. An example here would be the name of a cluster or node pool, where a unique name would be generated in case none is given.

Note: If property of type object named `a` has required properties, this does not indicate that `a` itself must be defined in the instance. However it indicates that if `a` is defined, the required properties must be defined, too.

### R10: Use `anyOf` and `oneOf` only for specific purposes {#r10}

The keywords `anyOf` and `oneOf` allow definition of multiple subschemas, where the payload must match the constraints of either one (`oneOf`) or any number of (`anyOf`) subschemas. For user interface generation, this creates great complications, hence we strongly restrict the use of these features.

A cluster app MAY only make use of the `anyOf` or `oneOf` keyword in the following ways:

1. to specify validation constraints via subschemas
2. to declare one or more subschemas as deprecated

#### (1) Specifying validation constraints via subschemas

The idea here is that each subschema only defines constraints for the validation of the payload.

In this case, the subschemas MUST NOT contain any of the following keywords:

- `type` declaration
- annotations (`title`, `description`, `examples`)
- `properties`, `patternProperties`, `additionalProperties`
- `items`, `additionalItems`

The following examples shows a string property with two subschemas, where each one has a different validation pattern.

```json
"diskSize": {
"type": "string",
"title": "Volume size",
"examples": ["10 GB", "10000 MB"],
"oneOf": [
{
"pattern": "^[0-9]+ GB$"
},
{
"pattern": "^[0-9]+ MB$"
}
]
}
```

#### (2) Declaring a subschema as deprecated

If the schema of a property changes over time, while keeping the name of the property, the use of subschemas in combination with the `deprecated` keyword can help phase out an old schema and introduce a new one.

As an exception to the rules defined in (1) above, the subschemas MAY use the JSON Schema keywords forbidden under (1) if `"deprecated": true` is present in all except one of the subschemas.

In the case of a generated user interface, only the non-deprecated subschema will be applied. All other subschemas will be ignored.

Example:

```json
"replicas": {
"anyOf": [
{
"type": "string",
"deprecated": true,
"$comment": "to be removed in the next major version, please use the integer type instead"
},
{
"type": "integer",
"maximum": 100,
"title": "Size"
}
]
}
```

## R11: Use `deprecated` to phase out properties {#r11}

To indicate that a property is supposed to be removed in a future version, the property SHOULD carry the `deprecated` key with the value `true`.

As a consequence, user interfaces for cluster creation SHALL NOT display the according form field for data entry. However, providing the property within values YAML will not cause any failure.

In addition, it is RECOMMENDED to add a `$comment` key to the property, with information regarding

- which property will replace the deprecated one (if any)
- when the property will be removed

Note that `$comment` content is not intended for display in any UI nor processing in any tool. It is mainly targeting schema developers and anyone using the schema itself as a source of information.

## R12: Provide valid string values where possible {#r12}

For string properties, in some cases only a few values are considered valid. In this case, the schema SHOULD specify these selectable values in one of the following forms:

1. Using `enum` in case the selectable value does not require a more user-friendly label.
2. Using `oneOf` with a combination of `title` and `const`, in case a user-friendly label is needed in addition to the value. See below for more details.

As an example of the `enum` method, we use a string property with two possible values. The values `active` and `inactive` are considered to be speaking for themselves.

```json
"autoRefresh": {
"type": "string",
"title": "Auto refresh",
"enum": ["active", "inactive"]
}
```

However, if the string values are not considered self-explanatory, the `oneOf` method can be used to provide user-friendly labels in addition to the axctual value. Here, `oneOf` is an array of tuples, in which `title` contains the user-friendly description and `const` provides the value.

```json
"imagePullPolicy": {
"type": "string",
"title": "Pull policy",
"description": "Whether the container image should be pulled from a registry",
"oneOf": [
{"const": "IfNotPresent", "title": "Pull image if not present in the node"},
{"const": "Always", "title": "Always pull up-to-date image"},
{"const": "Never", "title": "Never pull image, use image available locally"}
]
}
```

### R13: Avoid recursion {#r13}

The JSON Schema keywords `dynamicRef`, `dynamicAnchor`, and `recursiveRef` MUST NOT be used.

### R14: Avoid logical constructs using `if`, `then`, `else` {#r14}

The JSON Schema keywords `if`, `then`, and `else` MUST NOT be used.

### R15: Avoid `unevaluatedProperties`, `unevaluatedItems` {#r15}

The JSON Schema keywords `unevaluatedProperties` , `unevaluatedItems` MUST NOT be used.

### R16: Array items and tuple validation {#r16}

All array items MUST be of the same type.

The JSON Schema keywords `contains`, `additionalItems` and `prefixItems` MUST NOT be used.

### R17: Common schema structure {#r17}

For cluster apps, we aim to use a common schema structure among the providers we support. This structure accounts for the necessary differences that can occur between providers, as some concepts and implementations are provider-specific.

Each cluster app values schema MUST offer the following root level properties:

| Property name | Property type | Description |
|-|-|-|
| `metadata` | object | Descriptive settings like name, description, cluster labels (e. g. service priority), owner organization |
| `connectivity` | object | Settings related to connectivity and networking, and defining how the cluster reaches the outside world and how it can be reached. |
| `controlPlane` | object | Configuration of the cluster's control plane, Etcd, API server and more. |
| `nodePools` | array | Configuration of node pools (groups of worker nodes), regardless of their implentation flavour. |

In addition, the cluster app values schema SHOULD offer the following root level properties:

| Property name | Property type | Description |
|-|-|-|
| `internal` | object | Settings which are not supposed to be configured by end users, and which won't be exposed via user interfaces. Also experimental features that undergo schema changes. |
| `providerSpecific` | object | Configuration specific to the infrastructure provider. |

For compatibility reasons, the schema MAY have the following properties in the root level:

- `managementCluster`
- `baseDomain`
- `provider`
- `cluster-shared`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


The schema MUST NOT define any other properties on the root level, in addition to the ones mentioned above.

Copy link

@mogottsch mogottsch Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we forbid empty objects, I suggest adding that to the RFC (related issue).

Suggested change
### R19: Avoid empty objects {#r19}
All objects MUST specify at least one property through at least one of the following keywords: `properties`, `additionalProperties`, `patternProperties`.

## TODO

I haven't gotten to these yet, or I'm not sure about them.

- Should schemas have the `$id` property defined? And if yes, to what?

- Use of the `default` keyword. As per JSON Schema documentation, it is for annotation only. It is not meant to automatically fill in missing values for validation.

- How to specify whether a property can be modified or not. `"readOnly": true` mit be the right one for that.

- Clarify use of `Not`.

- dependentRequired, dependentSchemas: to be evaluated.

- contentEncoding in combination with contentSchema: to be defined.
- see "username:password" in base 64 example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "How to verify/format JSON schema files" (in your editor)

## Resources

- [Understanding JSON Schema](https://json-schema.org/understanding-json-schema/)