-
Notifications
You must be signed in to change notification settings - Fork 76
Add SQL-to-Kotlin DataFrame transition guide for backend developers #1377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
225 changes: 225 additions & 0 deletions
225
docs/StardustDocs/topics/guides/Guide-for-backend-SQL-developers.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,225 @@ | ||
| # Kotlin DataFrame for SQL & Backend Developers | ||
|
|
||
| <web-summary> | ||
| Quickly transition from SQL to Kotlin DataFrame: load your datasets, perform essential transformations, and visualize your results — directly within a Kotlin Notebook. | ||
| </web-summary> | ||
|
|
||
| <card-summary> | ||
| Switching from SQL? Kotlin DataFrame makes it easy to load, process, analyze, and visualize your data — fully interactive and type-safe! | ||
| </card-summary> | ||
|
|
||
| <link-summary> | ||
| Explore Kotlin DataFrame as a SQL or ORM user: read your data, transform columns, group or join tables, and build insightful visualizations with Kotlin Notebook. | ||
| </link-summary> | ||
|
|
||
| This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar SQL and ORM operations to DataFrame concepts. | ||
|
|
||
| We recommend starting with [**Kotlin Notebook**](SetupKotlinNotebook.md) — an IDE-integrated tool similar to Jupyter Notebook. | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| It lets you explore data interactively, render DataFrames, create plots, and use all your IDE features within the JVM ecosystem. | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| If you plan to work on a Gradle project without a notebook, we recommend installing the library together with our [**experimental Kotlin compiler plugin**](Compiler-Plugin.md) (available since version 2.2.*). | ||
| This plugin generates type-safe schemas at compile time, tracking schema changes throughout your data pipeline. | ||
|
|
||
| <!---IMPORT org.jetbrains.kotlinx.dataframe.samples.guides.QuickStartGuide--> | ||
|
|
||
| ## Quick Setup | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| To start working with Kotlin DataFrame in a Kotlin Notebook, run the cell with the next code: | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```kotlin | ||
| %useLatestDescriptors | ||
| %use dataframe | ||
| ``` | ||
|
|
||
| This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame | ||
| rendering. Learn more [here](SetupKotlinNotebook.md#integrate-kotlin-dataframe). | ||
|
|
||
| --- | ||
|
|
||
| ## 1. What is a DataFrame? | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| If you’re used to SQL, a **DataFrame** is conceptually like a **table**: | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - **Rows**: ordered records of data | ||
| - **Columns**: named, typed fields | ||
| - **Schema**: a mapping of column names to types | ||
|
|
||
| Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) — columns can contain *nested DataFrames* or *column groups*, allowing you to represent and transform tree-like structures without flattening. | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Unlike a relational DB table: | ||
|
|
||
| - A DataFrame **lives in memory** — there’s no storage engine or transaction log | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - It’s **immutable** — each operation produces a *new* DataFrame | ||
| - There is **no concept of foreign keys or relations** between DataFrames | ||
| - It can be created from *any* [source](Data-Sources.md): [CSV](CSV-TSV.md), [JSON](JSON.md), [SQL tables](SQL.md), [Apache Arrow](ApacheArrow.md), in-memory objects | ||
|
|
||
| --- | ||
|
|
||
| ## 2. Reading Data From SQL | ||
|
|
||
| Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis. | ||
|
|
||
| | Approach | Example | | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| |------------------------------------|---------| | ||
| | **From a table** | `val df = DataFrame.readSqlTable(dbConfig, "customers")` | | ||
| | **From a SQL query** | `val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")` | | ||
| | **From a JDBC Connection** | `val df = connection.readDataFrame("SELECT * FROM orders")` | | ||
| | **From a ResultSet (extension)** | `val df = resultSet.readDataFrame(connection)` | | ||
|
|
||
| ```kotlin | ||
zaleslaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig | ||
|
|
||
| val dbConfig = DbConnectionConfig( | ||
| url = "jdbc:postgresql://localhost:5432/mydb", | ||
| user = "postgres", | ||
| password = "secret" | ||
| ) | ||
|
|
||
| // Table | ||
| val customers = DataFrame.readSqlTable(dbConfig, "customers") | ||
|
|
||
| // Query | ||
| val salesByRegion = DataFrame.readSqlQuery(dbConfig, """ | ||
| SELECT region, SUM(amount) AS total | ||
| FROM sales | ||
| GROUP BY region | ||
| """) | ||
|
|
||
| // From JDBC connection | ||
| connection.readDataFrame("SELECT * FROM orders") | ||
|
|
||
| // From ResultSet | ||
| val rs = connection.createStatement().executeQuery("SELECT * FROM orders") | ||
| rs.readDataFrame(connection) | ||
| ``` | ||
|
|
||
zaleslaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| More information could be found [here](readSqlDatabases.md). | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 3. Why It’s Not an ORM | ||
|
|
||
| Frameworks like **Hibernate** or **Exposed**: | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Map DB tables to Kotlin objects (entities) | ||
| - Track object changes and sync them back to the database | ||
| - Focus on **persistence** and **transactions** | ||
|
|
||
| Kotlin DataFrame: | ||
| - Has no persistence layer | ||
| - Doesn’t try to map rows to mutable entities | ||
| - Focuses on **in-memory analytics**, **transformations**, and **type-safe pipelines** | ||
| - The **main idea** is that the schema *changes together with your transformations* — and the [**Compiler Plugin**](Compiler-Plugin.md) updates the type-safe API automatically under the hood. | ||
| - You don’t have to manually define or recreate schemas every time — the plugin infers them dynamically from data or transformations. | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - In ORMs, the mapping layer is **frozen** — schema changes require manual model edits and migrations. | ||
|
|
||
| Think of Kotlin DataFrame as a **data analysis/ETL tool**, not an ORM. | ||
zaleslaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| --- | ||
|
|
||
| ## 4. Key Differences from SQL & ORMs | ||
|
|
||
| | Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame | | ||
| |------------------------------------|-------------------------------------|---------------------------|------------------| | ||
| | **Storage** | Persistent | Persistent | In-memory only | | ||
| | **Schema definition** | `CREATE TABLE` DDL | Defined in entity classes | Derived from data or transformations | | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | **Schema change** | `ALTER TABLE` | Manual migration of entity classes | Automatic via transformations + Compiler Plugin | | ||
| | **Relations** | Foreign keys | Mapped via annotations | Not applicable | | ||
| | **Transactions** | Yes | Yes | Not applicable | | ||
| | **Indexes** | Yes | Yes (via DB) | Not applicable | | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | **Data manipulation** | SQL DML (`INSERT`, `UPDATE`) | CRUD mapped to DB | Transformations only (immutable) | | ||
| | **Joins** | `JOIN` keyword | Eager/lazy loading | `.join()` / `.leftJoin()` DSL | | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | **Grouping & aggregation** | `GROUP BY` | DB query with groupBy | `.groupBy().aggregate()` | | ||
| | **Filtering** | `WHERE` | Criteria API / query DSL | `.filter { ... }` | | ||
| | **Permissions** | `GRANT` / `REVOKE` | DB-level permissions | Not applicable | | ||
| | **Execution** | On DB engine | On DB engine | In JVM process | | ||
|
|
||
| --- | ||
|
|
||
| ## 5. SQL → Kotlin DataFrame Cheatsheet | ||
|
|
||
| ### DDL Analogues | ||
zaleslaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| | SQL DDL Command / Example | Kotlin DataFrame Equivalent | | ||
| |---------------------------|-----------------------------| | ||
| | **Create table:**<br>`CREATE TABLE person (name text, age int);` | `@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` | | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | **Add column:**<br>`ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;` | `.add("profit") { revenue - cost }` | | ||
| | **Rename column:**<br>`ALTER TABLE sales RENAME COLUMN old_name TO new_name;` | `.rename { old_name }.into("new_name")` | | ||
| | **Drop column:**<br>`ALTER TABLE sales DROP COLUMN old_col;` | `.remove { old_col }` | | ||
| | **Modify column type:**<br>`ALTER TABLE sales ALTER COLUMN amount TYPE numeric;` | `.convert { amount }.to<Double>()` | | ||
|
|
||
| ### DDL Analogues (TODO: decide to remove first DDL section or this) | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| | SQL DDL Command | Kotlin DataFrame Equivalent | | ||
| |--------------------------------|------------------------------------------------------------------| | ||
| | `CREATE TABLE` | Define `@DataSchema` interface or class <br>`@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` | | ||
| | `ALTER TABLE ADD COLUMN` | `.add("newCol") { ... }` | | ||
| | `ALTER TABLE DROP COLUMN` | `.remove("colName")` | | ||
| | `ALTER TABLE RENAME COLUMN` | `.rename { oldName }.into("newName")` | | ||
| | `ALTER TABLE MODIFY COLUMN` | `.convert { colName }.to<NewType>()` | | ||
|
|
||
| --- | ||
|
|
||
| ### DML Analogues | ||
|
|
||
| | SQL DML Command / Example | Kotlin DataFrame Equivalent | | ||
| |----------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------| | ||
| | `SELECT col1, col2` | `df.select { col1 and col2 }` | | ||
| | `WHERE amount > 100` | `df.filter { amount > 100 }` | | ||
| | `ORDER BY amount DESC` | `df.sortByDesc { amount }` | | ||
| | `GROUP BY region` | `df.groupBy { region }` | | ||
| | `SUM(amount)` | `.aggregate { sum(amount) }` | | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | `JOIN` | `.join(otherDf) { id match right.id }` | | ||
| | `LIMIT 5` | `.take(5)` | | ||
| | **Pivot:** <br>`SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);` | `.pivot(region, year) { sum(amount) }` | | ||
| | **Explode array column:** <br>`SELECT id, unnest(tags) AS tag FROM products;` | `.explode { tags }` | | ||
| | **Update column:** <br>`UPDATE sales SET amount = amount * 1.2;` | `.update { amount }.with { it * 1.2 }` | | ||
|
|
||
|
|
||
| ## 6. Example: SQL vs DataFrame Side-by-Side | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| **SQL (PostgreSQL):** | ||
zaleslaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ```sql | ||
| SELECT region, SUM(amount) AS total | ||
| FROM sales | ||
| WHERE amount > 0 | ||
| GROUP BY region | ||
| ORDER BY total DESC | ||
| LIMIT 5; | ||
| ``` | ||
|
|
||
| ```kotlin | ||
| sales.filter { amount > 0 } | ||
| .groupBy { region } | ||
| .aggregate { sum(amount).into("total") } | ||
| .sortByDesc { total } | ||
| .take(5) | ||
| ``` | ||
|
|
||
| ## In conclusion | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe** and fully integrated into Kotlin. | ||
| - The main focus is **readability**, schema change safety, and evolving API support via the [Compiler Plugin](Compiler-Plugin.md). | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - It is neither a database nor an ORM — a DataFrame does not store data or manage transactions but works as an in-memory layer for analytics and transformations. | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working with JSON-like structures and combining multiple data sources. | ||
| - Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the JVM, while keeping your code easily refactorable and IDE-assisted. | ||
|
|
||
zaleslaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## What's Next? | ||
| If you're ready to go through a complete example, we recommend our [Quickstart Guide](quickstart.md) | ||
| — you'll learn the basics of reading data, transforming it, and creating visualization step-by-step. | ||
|
|
||
| Ready to go deeper? Check out what’s next: | ||
|
|
||
| - 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets, | ||
zaleslaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame. | ||
|
|
||
| - 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do. | ||
|
|
||
| - 🧠 **Understand the design** and core concepts in the [library overview](concepts.md). | ||
|
|
||
| - 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)** | ||
| and make working with your data both convenient and type-safe. | ||
|
|
||
| - 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)** | ||
| for auto-generated column access in your IntelliJ IDEA projects. | ||
|
|
||
| - 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning | ||
zaleslaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| [Kandy Documentation](https://kotlin.github.io/kandy). | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.