Skip to content

Commit f8123a6

Browse files
authored
feat: Column functions fixes #54
+ Typed columns suppot
1 parent f181fc5 commit f8123a6

File tree

5 files changed

+944
-14
lines changed
  • kotlin-spark-api
    • 2.4/src
      • main/kotlin/org/jetbrains/kotlinx/spark/api
      • test/kotlin/org/jetbrains/kotlinx/spark/api
    • 3.0/src
      • main/kotlin/org/jetbrains/kotlinx/spark/api
      • test/kotlin/org/jetbrains/kotlinx/spark/api

5 files changed

+944
-14
lines changed

README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ We have opened a Spark Project Improvement Proposal: [Kotlin support for Apache
2020
- [withSpark function](#withspark-function)
2121
- [withCached function](#withcached-function)
2222
- [toList and toArray](#tolist-and-toarray-methods)
23+
- [Column infix/operator functions](#column-infixoperator-functions)
2324
- [Examples](#examples)
2425
- [Reporting issues/Support](#reporting-issuessupport)
2526
- [Code of Conduct](#code-of-conduct)
@@ -138,6 +139,57 @@ o call the `map` method and collect the resulting `Dataset`.
138139
For more idiomatic Kotlin code we've added `toList` and `toArray` methods in this API. You can still use the `collect` method as in Scala API, however the result should be casted to `Array`.
139140
This is because `collect` returns a Scala array, which is not the same as Java/Kotlin one.
140141

142+
### Column infix/operator functions
143+
144+
Similar to the Scala API for `Columns`, many of the operator functions could be ported over.
145+
For example:
146+
```kotlin
147+
dataset.select( col("colA") + 5 )
148+
dataset.select( col("colA") / col("colB") )
149+
150+
dataset.where( col("colA") `===` 6 )
151+
// or alternatively
152+
dataset.where( col("colA") eq 6)
153+
```
154+
In short, all supported operators are:
155+
- `==`,
156+
- `!=`,
157+
- `eq` / `` `===` ``,
158+
- `neq` / `` `=!=` ``,
159+
- `-col(...)`,
160+
- `!col(...)`,
161+
- `gt`,
162+
- `lt`,
163+
- `geq`,
164+
- `leq`,
165+
- `or`,
166+
- `and` / `` `&&` ``,
167+
- `+`,
168+
- `-`,
169+
- `*`,
170+
- `/`,
171+
- `%`
172+
173+
Secondly, there are some quality of life additions as well:
174+
175+
In Kotlin, Ranges are often
176+
used to solve inclusive/exclusive situations for a range. So, you can now do:
177+
```kotlin
178+
dataset.where( col("colA") inRangeOf 0..2 )
179+
```
180+
Also, for columns containing map- or array like types:
181+
```kotlin
182+
dataset.where( col("colB")[0] geq 5 )
183+
```
184+
185+
Finally, thanks to Kotlin reflection, we can provide a type- and refactor safe way
186+
to create `TypedColumn`s and with those a new Dataset from pieces of another using the `selectTyped()` function, added to the API:
187+
```kotlin
188+
val dataset: Dataset<YourClass> = ...
189+
val newDataset: Dataset<Pair<TypeA, TypeB>> = dataset.selectTyped(col(YourClass::colA), col(YourClass::colB))
190+
```
191+
192+
141193
## Examples
142194

143195
For more, check out [examples](https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/kotlinx/spark/examples) module.

0 commit comments

Comments
 (0)