-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Description
I am trying to extend auto-sklearn with new categorical encoders like catboost encoder, target encoder etc. I am currently using the scikit-learn-contrib/category_encoder package to do so. For spot check, I am just using the catboost encoder and forbidding other encoders (OHE) to see if I get acceptable results.
Here's the script for catboost encoder (which I have added under autosklearn->pipeline->data_preprocessing->categorical-encoding)
from autosklearn.pipeline.components.base import AutoSklearnPreprocessingAlgorithm
from autosklearn.pipeline.constants import DENSE, SPARSE, UNSIGNED_DATA, INPUT
import category_encoders.cat_boost as catboost_enc
class CatBoostEncoder(AutoSklearnPreprocessingAlgorithm):
def __init__(self, random_state=None):
self.random_state = random_state
def fit(self, X, y=None):
self.preprocessor = catboost_enc()
self.preprocessor.fit(X, y)
return self
def transform(self, X):
if self.preprocessor is None:
raise NotImplementedError()
return self.preprocessor.transform(X)
def fit_transform(self, X, y=None):
return self.fit(X, y).transform(X)
@staticmethod
def get_properties(dataset_properties=None):
return {'shortname': 'CatBoostEnc',
'name': 'CatBoost Encoder',
'handles_regression': True,
'handles_classification': True,
'handles_multiclass': True,
'handles_multilabel': True,
'handles_multioutput': True,
# TODO find out of this is right!
'handles_sparse': True,
'handles_dense': True,
'input': (DENSE, SPARSE, UNSIGNED_DATA),
'output': (INPUT,), }
@staticmethod
def get_hyperparameter_search_space(dataset_properties=None):
return ConfigurationSpace()
For spot check I print the encoders available during the run-time, which are:
But the output pipeline looks something like this:
Is there something that I am missing here? Is there a link I can refer to to add new categorical encoders? Would appreciate any help.
Thanks
Metadata
Metadata
Assignees
Labels
No labels