Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
276 changes: 271 additions & 5 deletions core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/insert.kt
Original file line number Diff line number Diff line change
Expand Up @@ -10,53 +10,230 @@ import org.jetbrains.kotlinx.dataframe.annotations.Interpretable
import org.jetbrains.kotlinx.dataframe.annotations.Refine
import org.jetbrains.kotlinx.dataframe.columns.ColumnAccessor
import org.jetbrains.kotlinx.dataframe.columns.ColumnPath
import org.jetbrains.kotlinx.dataframe.documentation.DocumentationUrls
import org.jetbrains.kotlinx.dataframe.documentation.DslGrammarLink
import org.jetbrains.kotlinx.dataframe.documentation.ExcludeFromSources
import org.jetbrains.kotlinx.dataframe.documentation.Indent
import org.jetbrains.kotlinx.dataframe.documentation.LineBreak
import org.jetbrains.kotlinx.dataframe.documentation.SelectingColumns
import org.jetbrains.kotlinx.dataframe.impl.api.insertImpl
import org.jetbrains.kotlinx.dataframe.impl.columnName
import org.jetbrains.kotlinx.dataframe.impl.removeAt
import org.jetbrains.kotlinx.dataframe.util.DEPRECATED_ACCESS_API
import org.jetbrains.kotlinx.dataframe.util.INSERT_AFTER_COL_PATH
import org.jetbrains.kotlinx.dataframe.util.INSERT_AFTER_COL_PATH_REPLACE
import kotlin.reflect.KProperty

// region DataFrame

// region insert

/**
* This function does not immediately insert the new column but instead specify a column to insert and
* returns an [InsertClause],
* which serves as an intermediate step.
* The [InsertClause] object provides methods to insert a new column using:
* - [under][InsertClause.under] - inserts a new column under the specified column group.
* - [after][InsertClause.after] - inserts a new column after the specified column.
* - [at][InsertClause.at]- inserts a new column at the specified position.
*
* Each method returns a new [DataFrame] with the inserted column.
*
* Check out [Grammar].
*
* @include [SelectingColumns.ColumnGroupsAndNestedColumnsMention]
*
* See [Selecting Columns][InsertSelectingOptions].
*
* For more information: {@include [DocumentationUrls.Insert]}
*
* See also:
* - [move][DataFrame.move] - move columns to a new position within the [DataFrame].
* - [add][DataFrame.add] - add new columns to the [DataFrame]
* (without specifying a position, to the end of the [DataFrame]).
*/
internal interface InsertDocs {

/**
* {@comment Version of [SelectingColumns] with correctly filled in examples}
* @include [SelectingColumns] {@include [SetInsertOperationArg]}
*/
interface InsertSelectingOptions

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget everything explicitly written in Kotlin code needs to be bold. You forgot one insert and all braces and commas should also be bold

* ## Insert Operation Grammar
* {@include [LineBreak]}
* {@include [DslGrammarLink]}
* {@include [LineBreak]}
*
* **[`insert`][insert]**`(column: `[`DataColumn`][DataColumn]`)`
*
* {@include [Indent]}
* `| `[`insert`][insert]`(name: `[`String`][String]`, infer: `[`Infer`][Infer]`, rowExpression: `[`RowExpression`][RowExpression]` }`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this 'or' looks a bit weird. I think it would look better on the same line as the previous insert, but then the line might be a bit too long. Or it could have grouping braces ()

*
* {@include [Indent]}
* __`.`__[**`under`**][InsertClause.under]` { `**`column: `[`ColumnSelector`][ColumnSelector]**` }`
*
* {@include [Indent]}
* `| `__`.`__[**`under`**][InsertClause.under]`(columnPath: `[`ColumnPath`][ColumnPath]`)`
*
* {@include [Indent]}
* `| `__`.`__[**`after`**][InsertClause.after]` { `**`column: `[`ColumnSelector`][ColumnSelector]**` }`
*
* {@include [Indent]}
* `| `__`.`__[**`at`**][InsertClause.at]`(position: `[`Int`][Int]`)`
*/
interface Grammar
}

/** {@set [SelectingColumns.OPERATION] [insert][insert]} */
@ExcludeFromSources
private interface SetInsertOperationArg

/**
* Inserts the given [column] into this [DataFrame].
*
* {@include [InsertDocs]}
*
* ### Examples:
* ```kotlin
* // Insert a new column "age" under the column group with path ("info", "personal").
* df.insert(age).under(pathOf("info", "personal"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love the examples, the main problem as usual - the update of these examples, but if we are close to the 1.0 - let's practice with it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One day we can move it to tests and add to KDocs using kodex ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yes! That's already possible. You can use @sample [link.to.sample]; KoDEx even recognizes // SampleStart etc, so we can reuse Korro samples even :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just never used this before because I preferred the examples be clickable, but maybe this kdoc/dokka team is working on that!

*
* // Insert a new column "count" after the column "url".
* df.insert(count).after { url }
* ```
*
* @param [column] A single [DataColumn] to insert into the [DataFrame].
* @return An [InsertClause] for specifying the placement of the new column.
*/
public fun <T, C> DataFrame<T>.insert(column: DataColumn<C>): InsertClause<T> = InsertClause(this, column)

/**
* Creates a new column using the provided [expression][AddExpression] and inserts it into this [DataFrame].
*
* {@include [AddExpressionDocs]}
*
* {@include [InsertDocs]}
*
* ## Examples
*
* ```kotlin
* // Insert a new column "sum" that contains the sum of values from the "firstValue"
* // and "secondValue" columns for each row after the "firstValue" column.
* val dfWithSum = df.insert("sum") { firstValue + secondValue }.after { firstValue }
*
* // Insert a new "fibonacci" column with the Fibonacci sequence under a "math" column group:
* // for the first two rows, the value is 1;
* // for subsequent rows, it's the sum of the two previous Fibonacci values.
* val dfWithFibonacci = df.insert("fibonacci") {
* if (index() < 2) 1
* else prev()!!.newValue<Int>() + prev()!!.prev()!!.newValue<Int>()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eyy! Now possible because of the switch to AddExpression, nice ;)

* }.under("math")
* ```
*
* @param [name] The name of the new column to be created and inserted.
* @param [infer] Controls how values are inferred when building the new column. Defaults to [Infer.Nulls].
* @param [expression] An [AddExpression] that computes the value for each row of the new column.
* @return An [InsertClause] for specifying the placement of the newly created column.
*/
@Interpretable("Insert1")
public inline fun <T, reified R> DataFrame<T>.insert(
name: String,
infer: Infer = Infer.Nulls,
noinline expression: RowExpression<T, R>,
noinline expression: AddExpression<T, R>,
): InsertClause<T> = insert(mapToColumn(name, infer, expression))

@Deprecated(DEPRECATED_ACCESS_API)
@AccessApiOverload
public inline fun <T, reified R> DataFrame<T>.insert(
column: ColumnAccessor<R>,
infer: Infer = Infer.Nulls,
noinline expression: RowExpression<T, R>,
noinline expression: AddExpression<T, R>,
): InsertClause<T> = insert(column.name(), infer, expression)

@Deprecated(DEPRECATED_ACCESS_API)
@AccessApiOverload
public inline fun <T, reified R> DataFrame<T>.insert(
column: KProperty<R>,
infer: Infer = Infer.Nulls,
noinline expression: RowExpression<T, R>,
noinline expression: AddExpression<T, R>,
): InsertClause<T> = insert(column.columnName, infer, expression)

// endregion

/**
* An intermediate class used in the [insert] operation.
*
* This class itself does not perform any insertions — it is a transitional step
* before specifying how to insert the selected columns.
* It must be followed by one of the inserting methods
* to produce a new [DataFrame] with an inserted column.
*
* Use the following methods to perform the insertion:
* - [under][InsertClause.under] - inserts a new column under the specified column group.
* - [after][InsertClause.after] - inserts a new column after the specified column.
* - [at][InsertClause.at]- inserts a new column at the specified position.
*
* See [Grammar][InsertDocs.Grammar] for more details.
*/
public class InsertClause<T>(internal val df: DataFrame<T>, internal val column: AnyCol) {
override fun toString(): String = "InsertClause(df=$df, column=$column)"
}

// region under

/**
* Inserts the new column previously specified with [insert] under
* the selected [column group][column].
*
* Works only with existing column groups.
* To insert into a new column group, use the overloads:
* `under(path: ColumnPath)` or `under(column: String)`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you create and link to an issue if we're gonna fix this?

*
* For more information: {@include [DocumentationUrls.Insert]}
*
* See [Grammar][InsertDocs.Grammar] for more details.
*
* See [SelectingColumns.Dsl].
*
* ### Examples
* ```kotlin
* // Insert a new column "age" under the column group with path ("info", "personal")
* df.insert(age).under { info.personal }
*
* // Insert a new column "sum" under the only top-level column group
* val dfWithSum = df.insert("sum") { a + b }.under { colGroups().single() }
* ```
*
* @param column The [ColumnSelector] used to choose an existing column group in this [DataFrame]
* under which the new column will be inserted.
* @return A new [DataFrame] with the inserted column placed under the selected group.
*/
@Refine
@Interpretable("Under0")
public fun <T> InsertClause<T>.under(column: ColumnSelector<T, *>): DataFrame<T> = under(df.getColumnPath(column))

/**
* Inserts the new column previously specified with [insert] under
* the column group defined by the given [columnPath].
*
* {@include [org.jetbrains.kotlinx.dataframe.documentation.ColumnPathCreation]}
*
* See [Grammar][InsertDocs.Grammar] for more details.
*
* For more information: {@include [DocumentationUrls.Insert]}
*
* ### Example
* ```kotlin
* // Insert a new column "age" under the column group with path ("info", "personal")
* df.insert(age).under(pathOf("info", "personal"))
* ```
*
* @param [columnPath] The [ColumnPath] specifying the path to a column group in this [DataFrame]
* under which the new column will be inserted.
* @return A new [DataFrame] with the inserted column placed under the specified column group.
*/
@Refine
@Interpretable("Under1")
public fun <T> InsertClause<T>.under(columnPath: ColumnPath): DataFrame<T> =
Expand All @@ -70,6 +247,26 @@ public fun <T> InsertClause<T>.under(column: ColumnAccessor<*>): DataFrame<T> =
@AccessApiOverload
public fun <T> InsertClause<T>.under(column: KProperty<*>): DataFrame<T> = under(column.columnName)

/**
* Inserts the new column previously specified with [insert] under
* the given column group by its [name][column].
*
* If the column group with the provided [name][column] does not exist, it will be created automatically.
*
* For more information: {@include [DocumentationUrls.Insert]}
*
* See [Grammar][InsertDocs.Grammar] for more details.
*
* ### Example
* ```kotlin
* // Insert a new column "age" under the "info" column group.
* df.insert(age).under("info")
* ```
*
* @param [column] The [name][String] of the column group in this [DataFrame].
* If the group does not exist, it will be created.
* @return A new [DataFrame] with the inserted column placed under the specified column group.
*/
@Refine
@Interpretable("Under4")
public fun <T> InsertClause<T>.under(column: String): DataFrame<T> = under(pathOf(column))
Expand All @@ -78,29 +275,98 @@ public fun <T> InsertClause<T>.under(column: String): DataFrame<T> = under(pathO

// region after

/**
* Inserts the new column previously specified with [insert]
* at the position immediately after the selected [column] (on the same level).
*
* For more information: {@include [DocumentationUrls.Insert]}
*
* See [Grammar][InsertDocs.Grammar] for more details.
*
* See also: [SelectingColumns.Dsl].
*
* ### Examples:
* ```kotlin
* // Insert a new column "age" after the "name" column
* df.insert(age).after { name }
*
* // Insert a new column "sum" after the nested "min" column (inside the "stats" column group)
* val dfWithSum = df.insert("sum") { a + b }.after { stats.min }
* ```
*
* @param [column] The [ColumnSelector] used to choose an existing column in this [DataFrame],
* after which the new column will be inserted.
* @return A new [DataFrame] with the inserted column placed after the selected column.
*/
@Refine
@Interpretable("InsertAfter0")
public fun <T> InsertClause<T>.after(column: ColumnSelector<T, *>): DataFrame<T> = after(df.getColumnPath(column))
public fun <T> InsertClause<T>.after(column: ColumnSelector<T, *>): DataFrame<T> = afterImpl(df.getColumnPath(column))

/**
* Inserts the new column previously specified with [insert]
* at the position immediately after the column with the given [name][column].
*
* For more information: {@include [DocumentationUrls.Insert]}
*
* See [Grammar][InsertDocs.Grammar] for more details.
*
* See also: [SelectingColumns.ColumnNames].
*
* ### Example
* ```kotlin
* // Insert a new column "age" after the "name" column
* df.insert(age).after("name")
* ```
*
* @param [column] The [String] name of the column in this [DataFrame]
* after which the new column will be inserted.
* @return A new [DataFrame] with the inserted column placed after the specified column.
*/
public fun <T> InsertClause<T>.after(column: String): DataFrame<T> = df.add(this.column).move(this.column).after(column)

@Deprecated(DEPRECATED_ACCESS_API)
@AccessApiOverload
public fun <T> InsertClause<T>.after(column: ColumnAccessor<*>): DataFrame<T> = after(column.path())
public fun <T> InsertClause<T>.after(column: ColumnAccessor<*>): DataFrame<T> = afterImpl(column.path())

@Deprecated(DEPRECATED_ACCESS_API)
@AccessApiOverload
public fun <T> InsertClause<T>.after(column: KProperty<*>): DataFrame<T> = after(column.columnName)

@Deprecated(INSERT_AFTER_COL_PATH, ReplaceWith(INSERT_AFTER_COL_PATH_REPLACE), DeprecationLevel.ERROR)
public fun <T> InsertClause<T>.after(columnPath: ColumnPath): DataFrame<T> {
val dstPath = ColumnPath(columnPath.removeAt(columnPath.size - 1) + column.name())
return df.insertImpl(dstPath, column).move { dstPath }.after { columnPath }
}

internal fun <T> InsertClause<T>.afterImpl(columnPath: ColumnPath): DataFrame<T> {
val dstPath = ColumnPath(columnPath.removeAt(columnPath.size - 1) + column.name())
return df.insertImpl(dstPath, column).move { dstPath }.after { columnPath }
}

// endregion

// region at

/**
* Inserts the new column previously specified with [insert]
* at the given [position] in the [DataFrame].
*
* The new column will be placed at the specified index, shifting existing columns to the right.
*
* For more information: {@include [DocumentationUrls.Insert]}
*
* See [Grammar][InsertDocs.Grammar] for more details.
*
* ### Example
* ```kotlin
* // Insert a new column "age" at index 3
* df.insert(age).at(3)
* ```
*
* @param [position] The [Int] index where the new column should be inserted.
* Columns currently at this index and after will be shifted right.
* @return A new [DataFrame] with the inserted column placed at the specified position.
*/
@Refine
@Interpretable("InsertAt")
public fun <T> InsertClause<T>.at(position: Int): DataFrame<T> = df.add(column).move(column).to(position)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,4 +134,7 @@ internal interface DocumentationUrls {

/** [See `format` on the documentation website.]({@include [Url]}/format.html) */
interface Format

/** [See `insert` on the documentation website.]({@include [Url]}/insert.html) */
interface Insert
}
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,11 @@ internal const val COL_TYPE_INSTANT =
"kotlinx.datetime.Instant is deprecated in favor of kotlin.time.Instant. Either migrate to kotlin.time.Instant and use ColType.StdlibInstant or use ColType.DeprecatedInstant. $MESSAGE_1_0 and migrated to kotlin.time.Instant in 1.1."
internal const val COL_TYPE_INSTANT_REPLACE = "ColType.DeprecatedInstant"

internal const val INSERT_AFTER_COL_PATH =
"This `after()` overload will be removed in favor of `after { }` with Column Selection" +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary line break

"DSL. $MESSAGE_1_0"
internal const val INSERT_AFTER_COL_PATH_REPLACE = "this.after { columnPath }"

// endregion

// region WARNING in 1.0, ERROR in 1.1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ internal class InsertAfter0 : AbstractInterpreter<PluginDataFrameSchema>() {

override fun Arguments.interpret(): PluginDataFrameSchema {
return receiver.df.asDataFrame()
.insert(receiver.column.asDataColumn()).after(column.col.path)
.insert(receiver.column.asDataColumn()).after { column.col.path }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does nothing, the compiler plugin lives in kotlin now ;P

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just like the whole project be able to build. May be we should disable build of plugin module then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge it then and disable plugin module as another PR later

.toPluginDataFrameSchema()
}
}
Expand Down