Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/StardustDocs/d.tree
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
<toc-element topic="Kotlin-DataFrame-Features-in-Kotlin-Notebook.md">
<toc-element topic="Trobleshooting.md"/>
</toc-element>
<toc-element topic="Guide-for-backend-SQL-developers.md"/>
</toc-element>

<toc-element topic="Setup.md" accepts-web-file-names="gettingstarted">
Expand Down
245 changes: 245 additions & 0 deletions docs/StardustDocs/topics/guides/Guide-for-backend-SQL-developers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# Kotlin DataFrame for SQL & Backend Developers

<web-summary>
Quickly transition from SQL to Kotlin DataFrame: load your datasets, perform essential transformations, and visualize your results — directly within a Kotlin Notebook.
</web-summary>

<card-summary>
Switching from SQL? Kotlin DataFrame makes it easy to load, process, analyze, and visualize your data — fully interactive and type-safe!
</card-summary>

<link-summary>
Explore Kotlin DataFrame as a SQL or ORM user: read your data, transform columns, group or join tables, and build insightful visualizations with Kotlin Notebook.
</link-summary>

This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar
SQL and ORM operations to DataFrame concepts.

If you plan to work on a Gradle project without a Kotlin Notebook,
we recommend installing the library together with our [**experimental Kotlin compiler plugin**](Compiler-Plugin.md) (available since version 2.2.*).
This plugin generates type-safe schemas at compile time,
tracking schema changes throughout your data pipeline.

## Add Kotlin DataFrame Gradle dependency

You could read more about the setup of the Gradle build in the [Gradle Setup Guide](SetupGradle.md).

In your Gradle build file (`build.gradle` or `build.gradle.kts`), add the Kotlin DataFrame library as a dependency:

<tabs>
<tab title="Kotlin DSL">

```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
}
```

</tab>

<tab title="Groovy DSL">

```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
}
```

</tab>
</tabs>

---

## 1. What is a dataframe?

If you’re used to SQL, a **dataframe** is conceptually like a **table**:

- **Rows**: ordered records of data
- **Columns**: named, typed fields
- **Schema**: a mapping of column names to types

Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) —
columns can contain *[nested dataframes](DataColumn.md#framecolumn)* or *column groups*,
allowing you to represent and transform tree-like structures without flattening.

Unlike a relational DB table:

- A DataFrame object **lives in memory** — there’s no storage engine or transaction log
- It’s **immutable** — each operation produces a *new* DataFrame
- There is **no concept of foreign keys or relations** between DataFrames
- It can be created from
*any* [source](Data-Sources.md): [CSV](CSV-TSV.md), [JSON](JSON.md), [SQL tables](SQL.md), [Apache Arrow](ApacheArrow.md),
in-memory objects

---

## 2. Reading Data From SQL

Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis.

| Approach | Example |
|----------------------------------|---------------------------------------------------------------------|
| **From a table** | `val df = DataFrame.readSqlTable(dbConfig, "customers")` |
| **From a SQL query** | `val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")` |
| **From a JDBC Connection** | `val df = connection.readDataFrame("SELECT * FROM orders")` |
| **From a ResultSet (extension)** | `val df = resultSet.readDataFrame(connection)` |

```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig

val dbConfig = DbConnectionConfig(
url = "jdbc:postgresql://localhost:5432/mydb",
user = "postgres",
password = "secret"
)

// Table
val customers = DataFrame.readSqlTable(dbConfig, "customers")

// Query
val salesByRegion = DataFrame.readSqlQuery(
dbConfig, """
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
"""
)

// From JDBC connection
connection.readDataFrame("SELECT * FROM orders")

// From ResultSet
val rs = connection.createStatement().executeQuery("SELECT * FROM orders")
rs.readDataFrame(connection)
```

More information can be found [here](readSqlDatabases.md).

## 3. Why It’s Not an ORM

Frameworks like **[Hibernate](https://hibernate.org/orm/)** or **[Exposed](https://github.com/JetBrains/Exposed)**:

- Map DB tables to Kotlin objects (entities)
- Track object changes and sync them back to the database
- Focus on **persistence** and **transactions**

Kotlin DataFrame:

- Has no persistence layer
- Doesn’t try to map rows to mutable entities
- Focuses on **in-memory analytics**, **transformations**, and **type-safe pipelines**
- The **main idea** is that the schema *changes together with your transformations* — and the [**Compiler Plugin
**](Compiler-Plugin.md) updates the type-safe API automatically under the hood.
- You don’t have to manually define or recreate schemas every time — the plugin infers them dynamically from the data or
transformations.
- In ORMs, the mapping layer is **frozen** — schema changes require manual model edits and migrations.

Think of Kotlin DataFrame as a **data analysis/ETL tool**, not an ORM.

---

## 4. Key Differences from SQL & ORMs

| Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame |
|----------------------------|------------------------------------|------------------------------------|---------------------------------------------------------------------|
| **Storage** | Persistent | Persistent | In-memory only |
| **Schema definition** | `CREATE TABLE` DDL | Defined in entity classes | Derived from data or transformations or defined manually |
| **Schema change** | `ALTER TABLE` | Manual migration of entity classes | Automatic via transformations + Compiler Plugin or defined manually |
| **Relations** | Foreign keys | Mapped via annotations | Not applicable |
| **Transactions** | Yes | Yes | Not applicable |
| **DB Indexes** | Yes | Yes (via DB) | Not applicable |
| **Data manipulation** | SQL DML (`INSERT`, `UPDATE`) | CRUD mapped to DB | Transformations only (immutable) |
| **Joins** | `JOIN` keyword | Eager/lazy loading | [`.join()` / `.leftJoin()` DSL](join.md) |
| **Grouping & aggregation** | `GROUP BY` | DB query with groupBy | [`.groupBy().aggregate()`](groupBy.md) |
| **Filtering** | `WHERE` | Criteria API / query DSL | [`.filter { ... }`](filter.md) |
| **Permissions** | `GRANT` / `REVOKE` | DB-level permissions | Not applicable |
| **Execution** | On DB engine | On DB engine | In JVM process |

---

## 5. SQL → Kotlin DataFrame Cheatsheet

### DDL Analogues

| SQL DDL Command / Example | Kotlin DataFrame Equivalent |
|---------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
| **Create table:**<br>`CREATE TABLE person (name text, age int);` | `@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` |
| **Add column:**<br>`ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;` | `.add("profit") { revenue - cost }` |
| **Rename column:**<br>`ALTER TABLE sales RENAME COLUMN old_name TO new_name;` | `.rename { old_name }.into("new_name")` |
| **Drop column:**<br>`ALTER TABLE sales DROP COLUMN old_col;` | `.remove { old_col }` |
| **Modify column type:**<br>`ALTER TABLE sales ALTER COLUMN amount TYPE numeric;` | `.convert { amount }.to<Double>()` |

---

### DML Analogues

| SQL DML Command / Example | Kotlin DataFrame Equivalent |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
| `SELECT col1, col2` | `df.select { col1 and col2 }` |
| `WHERE amount > 100` | `df.filter { amount > 100 }` |
| `ORDER BY amount DESC` | `df.sortByDesc { amount }` |
| `GROUP BY region` | `df.groupBy { region }` |
| `SUM(amount)` | `.aggregate { sum { amount } }` |
| `JOIN` | `.join(otherDf) { id match right.id }` |
| `LIMIT 5` | `.take(5)` |
| **Pivot:** <br>`SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);` | `.pivot(region, year) { sum { amount } }` |
| **Explode array column:** <br>`SELECT id, unnest(tags) AS tag FROM products;` | `.explode { tags }` |
| **Update column:** <br>`UPDATE sales SET amount = amount * 1.2;` | `.update { amount }.with { it * 1.2 }` |

## 6. Example: SQL vs. DataFrame Side-by-Side

**SQL (PostgreSQL):**

```sql
SELECT region, SUM(amount) AS total
FROM sales
WHERE amount > 0
GROUP BY region
ORDER BY total DESC LIMIT 5;
```

```kotlin
sales.filter { amount > 0 }
.groupBy { region }
.aggregate { sum(amount).into("total") }
.sortByDesc { total }
.take(5)
```

## In Conclusion

- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe
** and fully integrated into Kotlin.
- The main focus is **readability** and schema change safety via
the [Compiler Plugin](Compiler-Plugin.md).
- It is neither a database nor an ORM — a Kotlin DataFrame library does not store data or manage transactions but works as an in-memory
layer for analytics and transformations.
- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working
with JSON-like structures and combining multiple data sources.
- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the
JVM, while keeping your code easily refactorable and IDE-assisted.
- Use Kotlin DataFrame for small- and average-sized datasets, but for large datasets, consider using a more
**performant** database engine.

## What's Next?

If you're ready to go through a complete example, we recommend our **[Quickstart Guide](quickstart.md)**
— you'll learn the basics of reading data, transforming it, and creating visualization step-by-step.

Ready to go deeper? Check out what’s next:

- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.

- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.

- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).

- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
and make working with your data both convenient and type-safe.

- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
for auto-generated column access in your IntelliJ IDEA projects.

- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations
[Kandy Documentation](https://kotlin.github.io/kandy).
3 changes: 3 additions & 0 deletions docs/StardustDocs/topics/guides/Guides-And-Examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Explore our structured, in-depth guides to steadily improve your Kotlin DataFram

<img src="quickstart_preview.png" border-effect="rounded" width="705"/>

* [](Guide-for-backend-SQL-developers.md) — migration guide for backend developers with SQL/ORM experience moving to Kotlin DataFrame

* [](extensionPropertiesApi.md) — learn about extension properties for [`DataFrame`](DataFrame.md)
and make working with your data both convenient and type-safe.

Expand Down Expand Up @@ -60,6 +62,7 @@ and make working with your data both convenient and type-safe.
* [Apache Spark Interop (With Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/kotlinSpark)
* [Multik Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/multik)
* [JetBrains Exposed Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/exposed)
* [Hibernate ORM](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/hibernate)
* [OpenAPI Guide](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/json/KeyValueAndOpenApi.ipynb)
— learn how to parse and explore [OpenAPI](https://swagger.io) JSON structures using Kotlin DataFrame,
enabling structured access and intuitive analysis of complex API schemas (*experimental*, supports OpenAPI 3.0.0).
Expand Down