Skip to content

generateEncoder() fails for data class with ByteArray field #111

@mlin

Description

@mlin

I have a data class containing a ByteArray blob field. When I try to work with a dataset of these I get (kotlin-spark-api v1.02, spark v3.1.2)

Exception in thread "main" java.lang.ClassCastException: class org.apache.spark.sql.types.BinaryType$ cannot be cast to class org.apache.spark.sql.types.ObjectType (org.apache.spark.sql.types.BinaryType$ and org.apache.spark.sql.types.ObjectType are in unnamed module of loader 'app')
        at org.apache.spark.sql.KotlinReflection$.toCatalystArray$1(KotlinReflection.scala:609)
        at org.apache.spark.sql.KotlinReflection$.$anonfun$serializerFor$1(KotlinReflection.scala:788)
        at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69)
        at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects(KotlinReflection.scala:1012)
        at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects$(KotlinReflection.scala:1011)
        at org.apache.spark.sql.KotlinReflection$.cleanUpReflectionObjects(KotlinReflection.scala:47)
        at org.apache.spark.sql.KotlinReflection$.serializerFor(KotlinReflection.scala:591)
        at org.apache.spark.sql.KotlinReflection$.$anonfun$serializerFor$16(KotlinReflection.scala:761)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
        at scala.collection.TraversableLike.map(TraversableLike.scala:238)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
        at org.apache.spark.sql.KotlinReflection$.$anonfun$serializerFor$1(KotlinReflection.scala:748)
        at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69)
        at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects(KotlinReflection.scala:1012)
        at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects$(KotlinReflection.scala:1011)
        at org.apache.spark.sql.KotlinReflection$.cleanUpReflectionObjects(KotlinReflection.scala:47)
        at org.apache.spark.sql.KotlinReflection$.serializerFor(KotlinReflection.scala:591)
        at org.apache.spark.sql.KotlinReflection$.serializerFor(KotlinReflection.scala:578)
        at org.apache.spark.sql.KotlinReflection.serializerFor(KotlinReflection.scala)
        at org.jetbrains.kotlinx.spark.api.ApiV1Kt.kotlinClassEncoder(ApiV1.kt:180)
        at org.jetbrains.kotlinx.spark.api.ApiV1Kt.generateEncoder(ApiV1.kt:167)
...

Artificial repro is merely

import org.jetbrains.kotlinx.spark.api.*

data class BlobTest(val blob: ByteArray) {
    constructor(str: String) : this(str.toByteArray())
}

fun main() {
    withSpark() {
        dsOf(BlobTest("foo"), BlobTest("bar"))
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions