diff --git a/docs/StardustDocs/d.tree b/docs/StardustDocs/d.tree index 4bc8dabea3..42a51a1b76 100644 --- a/docs/StardustDocs/d.tree +++ b/docs/StardustDocs/d.tree @@ -20,6 +20,7 @@ + diff --git a/docs/StardustDocs/topics/gettingStarted/Modules.md b/docs/StardustDocs/topics/gettingStarted/Modules.md new file mode 100644 index 0000000000..f9c0ac0383 --- /dev/null +++ b/docs/StardustDocs/topics/gettingStarted/Modules.md @@ -0,0 +1,568 @@ +# Modules + +Kotlin DataFrame is composed of modules, allowing you to include only the functionality you need. +In addition, Kotlin DataFrame provides several [plugins](#plugins) +that significantly enhance the development experience +— making it more convenient, powerful, and enjoyable to work with. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModuleFunction
General
dataframeGeneral artifact – combines all core and IO artifacts except experimental ones.
Core
dataframe-coreThe DataFrame API and its implementation.
IO
dataframe-jsonProvides support for JSON format writing and reading.
dataframe-csvProvides support for CSV format writing and reading.
dataframe-excelProvides support for XSL/XLSX format writing and reading.
dataframe-jdbcProvides support for JDBC data sources reading.
dataframe-arrowProvides support for Apache Arrow format writing and reading.
Experimental modules
dataframe-geoProvides a new API for working with geospatial data and IO for geographic formats (GeoJSON, Shapefile).
dataframe-openapiProvides support for OpenAPI JSON format reading and writing.
dataframe-openapi-generatorProvides schema generation from OpenAPI specifications. Requires dataframe-openapi.
Plugins
kotlin.plugin.dataframeKotlin compiler plugin. Provides compile-time extension properties generation. +
kotlinx.dataframeGradle plugin. Provides schemas generation using Gradle.
kotlinx.dataframe:symbol-processor-allKSP plugin. Provides schemas generation using KSP.
+ + +## Configure the repository + +All Kotlin DataFrame modules are available from the Maven Central repository. +To use them, add the appropriate dependency into your repositories mapping: + + + + +```kotlin +repositories { + mavenCentral() +} +``` + + + + +```kotlin +repositories { + mavenCentral() +} +``` + + + + +## `dataframe` - general Kotlin DataFrame dependency {id="dataframe-general"} + +General-purpose artifact that includes all [core](#core-modules) and [IO](#io-modules) modules. +Does **not** include any [experimental modules](#experimental-modules). +Recommended if you don’t need fine-grained control over individual module dependencies. + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%") +} +``` + + + + + +```groovy +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%' +} +``` + + + + +## Core Kotlin DataFrame modules {id="core-modules"} + +#### `dataframe-core` + +The core [DataFrame](DataFrame.md) API and its implementation. +Includes all core functionality for working with data structures, expressions, schema management, and operations. + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-core:%dataFrameVersion%") +} +``` + + + + + +```groovy +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe-core:%dataFrameVersion%' +} +``` + + + + +## IO Kotlin DataFrame modules {id="io-modules"} + +#### `dataframe-json` {id="dataframe-json"} + +Provides all logic for DataFrame to be able to work with +JSON data sources; [reading](https://kotlin.github.io/dataframe/read.html#read-from-json) +and [writing](https://kotlin.github.io/dataframe/write.html#writing-to-json). +It's based on [Kotlinx Serialization](https://github.com/Kotlin/kotlinx.serialization). + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-json:%dataFrameVersion%") +} +``` + + + + + +```groovy +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe-json:%dataFrameVersion%' +} +``` + + + + +#### `dataframe-csv` {id="dataframe-csv"} + +Provides support for reading and writing CSV files. +Supports standard CSV format features such as delimiters, headers, and quotes. + +Based on high-performance [Deephaven CSV](https://github.com/deephaven/deephaven-csv). + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-csv:%dataFrameVersion%") +} +``` + + + + + +```groovy +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe-csv:%dataFrameVersion%' +} +``` + + + + +Note that `dataframe-json` is included with `dataframe-csv` by default. +This is to support JSON structures inside CSV files. +If you don't need this functionality, you can exclude it like so: + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-csv:%dataFrameVersion%") { + exclude("org.jetbrains.kotlinx", "dataframe-json") + } +} +``` + + + + + +```groovy +dependencies { + implementation('org.jetbrains.kotlinx:dataframe-csv:%dataFrameVersion%') { + exclude group: 'org.jetbrains.kotlinx', module: 'dataframe-json' + } +} +``` + + + + +#### `dataframe-excel` {id="dataframe-excel"} + +Provides support for reading and writing Excel files (`.xls` and `.xlsx`). +Compatible with standard spreadsheet editors and supports embedded structured data. + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-excel:%dataFrameVersion%") +} +``` + + + + + +```groovy +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe-excel:%dataFrameVersion%' +} +``` + + + + +Note that `dataframe-json` is included with `dataframe-excel` by default. +This is to support JSON structures inside Excel files. +If you don't need this functionality, you can exclude it like so: + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-excel:%dataFrameVersion%") { + exclude("org.jetbrains.kotlinx", "dataframe-json") + } +} +``` + + + + + +```groovy +dependencies { + implementation('org.jetbrains.kotlinx:dataframe-excel:%dataFrameVersion%') { + exclude group: 'org.jetbrains.kotlinx', module: 'dataframe-json' + } +} +``` + + + + +#### `dataframe-jdbc` {id="dataframe-jdbc"} + +Provides all logic for DataFrame to be able to work with +SQL databases that implement the JDBC protocol. + +See [Read from SQL databases](https://kotlin.github.io/dataframe/readsqldatabases.html) for more information +about how to use it. + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-jdbc:%dataFrameVersion%") +} +``` + + + + + +```groovy +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe-jdbc:%dataFrameVersion%' +} +``` + + + + +#### `dataframe-arrow` {id="dataframe-arrow"} + +Provides all logic and tests for DataFrame to be able to work with +[Apache Arrow](https://arrow.apache.org). + +See [Read Apache Arrow formats](https://kotlin.github.io/dataframe/read.html#read-apache-arrow-formats) and +[Writing to Apache Arrow formats](https://kotlin.github.io/dataframe/write.html#writing-to-apache-arrow-formats) +for more information about how to use it. + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-arrow:%dataFrameVersion%") +} +``` + + + + + +```groovy +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe-arrow:%dataFrameVersion%' +} +``` + + + + + +## Experimental Kotlin DataFrame modules {id="experimental-modules"} + +These modules are experimental and may be unstable. + +#### `dataframe-geo` + +Provides a new API for working with geospatial data, +including reading and writing geospatial formats (GeoJSON, Shapefile), +and performing geometry-aware operations. + +See [Geo guide](https://kotlin.github.io/kandy/geo-plotting-guide.html) for more details and examples. + +Requires [OSGeo Repository](https://repo.osgeo.org). + + + + +```kotlin +repositories { + maven("https://repo.osgeo.org/repository/release") +} + +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-geo:%dataFrameVersion%") +} +``` + + + + + +```groovy +repositories { + maven { + url 'https://repo.osgeo.org/repository/release' + } +} + +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe-geo:%dataFrameVersion%' +} +``` + + + + +#### `dataframe-openapi` + +Provides functionality to support auto-generated data schemas from OpenAPI 3.0.0 specifications. +This module is a companion to [`dataframe-openapi-generator`](#dataframe-openapi-generator): + +- `dataframe-openapi-generator` is used internally by the Gradle plugin and Jupyter integration + to generate data schemas from OpenAPI specs. + In the Gradle plugin, it powers the `dataschemas {}` DSL and the `@file:ImportDataSchema()` annotation. + In Jupyter, it enables the `importDataSchema()` function. + +- `dataframe-openapi` must be added as a dependency to the user project in order to use those generated data schemas. + +See: +- [Import OpenAPI Schemas in Gradle project](https://kotlin.github.io/dataframe/schemasimportopenapigradle.html) +- [Import Data Schemas, e.g. from OpenAPI, in Jupyter](https://kotlin.github.io/dataframe/schemasimportopenapijupyter.html) + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-openapi:%dataFrameVersion%") +} +``` + + + + + +```groovy +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe-openapi:%dataFrameVersion%' +} +``` + + + + +#### `dataframe-openapi-generator` + +Provides the logic and tooling necessary to import OpenAPI 3.0.0 specifications +as auto-generated data schemas for Kotlin DataFrame. +This module works in conjunction with [`dataframe-openapi`](#dataframe-openapi): + +- `dataframe-openapi-generator` is used internally by the Gradle plugin and Jupyter integration + to generate data schemas from OpenAPI specifications. + - In Gradle, it enables the `dataschemas {}` DSL and the `@file:ImportDataSchema()` annotation. + - In Jupyter, it powers the `importDataSchema()` function. + +- `dataframe-openapi` must be added as a dependency to the user project to actually use the generated schemas. + +See: +- [Import OpenAPI Schemas in Gradle project](https://kotlin.github.io/dataframe/schemasimportopenapigradle.html) +- [Import Data Schemas, e.g. from OpenAPI, in Jupyter](https://kotlin.github.io/dataframe/schemasimportopenapijupyter.html) + + + + +```kotlin +dependencies { + implementation("org.jetbrains.kotlinx:dataframe-openapi-generator:%dataFrameVersion%") +} +``` + + + + + +```groovy +dependencies { + implementation 'org.jetbrains.kotlinx:dataframe-openapi-generator:%dataFrameVersion%' +} +``` + + + + +## Plugins + + + +#### `kotlin.plugin.dataframe` — Kotlin DataFrame Compiler Plugin {id="kotlin.plugin.dataframe"} + +The Kotlin DataFrame compiler plugin enables support for [extension properties](extensionPropertiesApi.md) +in Gradle projects, allowing you to work with dataframes in a name- and type-safe manner. + +See the [Compiler Plugin setup guide](Compiler-Plugin.md#setup) for installation +and usage instructions for Gradle projects. + +Published as a Kotlin official plugin. +[Source code is available in the Kotlin repository](https://github.com/JetBrains/kotlin/tree/master/plugins/kotlin-dataframe). + +#### `kotlinx.dataframe` – Gradle Plugin {id="kotlinx.dataframe"} + +The Gradle plugin allows generating [data schemas](schemas.md) from samples of data +(of supported formats) like JSON, CSV, Excel files, or URLs, as well as from data fetched from SQL databases +using Gradle. + +See the [Gradle Plugin Reference](DataSchemaGenerationGradle.md) for installation +and usage instructions in Gradle projects. + +> By default, the Gradle plugin also applies the [KSP plugin](#ksp-plugin). + + + + +```kotlin +plugins { + id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%" +} +``` + + + + +```groovy +plugins { + id 'org.jetbrains.kotlinx.dataframe' version '%dataFrameVersion%' +} +``` + + + + +#### `kotlinx.dataframe:symbol-processor-all` – KSP Plugin {id="ksp-plugin"} + +The Gradle plugin allows generating [data schemas](schemas.md) from samples of data +(of supported formats) like JSON, CSV, Excel files, or URLs, as well as from data fetched from SQL databases +using Kotlin Symbol Processing (KSP). +This is useful for projects where you prefer or require schema generation at the source level. + +See [Data Schemas in Gradle Projects](schemasGradle.md) for usage details. + + + + +```kotlin +dependencies { + ksp("org.jetbrains.kotlinx.dataframe:symbol-processor-all:%dataFrameVersion%") +} +``` + + + + +```groovy +dependencies { + ksp 'org.jetbrains.kotlinx.dataframe:symbol-processor-all:%dataFrameVersion%' +} +``` + +