Release notes as follows:
- Updates RAPIDS dependencies to 25.06
- Spark Rapids Connect ML plugin improvements:
- Extends Spark Rapids Connect ML plugin to support accelerated KMeans, LinearRegression, RandomForest regression and classifiction, and PCA.
- Adds runtime spark configs for verbose, float32_inputs, num_workers to allow these to be set over spark connect when using the accelerated plugin.
- improves transfer of RandomForest models from python to jvm
- Bundles plugin jar for Spark 4.0 in pip package.
- bug fixes in UMAP and in LogisticRegression on large datasets
Known issues:
- RandomForest inference:
- may fail on nodes with multiple GPUs. Convert via cpu() api for cpu based inference as a work around.
- may fail for very wide inputs (e.g. > 10000 features).
- CrossValidator for RandomForest over Spark Connect will fail in Spark 4.0. Fix pending in Spark 4.1
pip package available at https://pypi.org/project/spark-rapids-ml/25.06.0/