-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Update example on extending data preprocessing #1269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update example on extending data preprocessing #1269
Conversation
Codecov Report
@@ Coverage Diff @@
## development #1269 +/- ##
===============================================
- Coverage 88.09% 88.02% -0.07%
===============================================
Files 140 140
Lines 11144 11147 +3
===============================================
- Hits 9817 9812 -5
- Misses 1327 1335 +8
Continue to review full report at Codecov.
|
I tried the example and it works as is, but if you change
|
Hi @mdbecker, Thanks for reporting this, I'll have a look into it but I'm not sure why that would be. |
def __init__(self, **kwargs): | ||
""" This preprocessors does not change the data """ | ||
self.preprocessor = None | ||
# Some internal checks makes sure parameters are set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by some internal checks
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure where it was but there is a check somewhere in the pipeline that checks that certain attributes are set on the object from the **kwargs
hence the snippet right below this
for key, val in kwargs.items():
setattr(self, key, val)
Hi @mdbecker, I couldn't reproduce the failure, I've put my entire code snippet below which could hopefully help diagnose any issues. import autosklearn
from autosklearn.classification import AutoSklearnClassifier
from autosklearn.pipeline.components.base import AutoSklearnPreprocessingAlgorithm
from sklearn.datasets import load_breast_cancer
import sklearn.metrics
from autosklearn.pipeline.constants import SPARSE, DENSE, UNSIGNED_DATA, INPUT
from sklearn.model_selection import train_test_split
from ConfigSpace.configuration_space import ConfigurationSpace
X, y = load_breast_cancer(return_X_y=True)
class NoPreprocessing(AutoSklearnPreprocessingAlgorithm):
def __init__(self, **kwargs):
"""This preprocessors does not change the data"""
# Some internal checks makes sure parameters are set
for key, val in kwargs.items():
setattr(self, key, val)
def fit(self, X, Y=None):
return self
def transform(self, X):
return X
@staticmethod
def get_properties(dataset_properties=None):
return {
"shortname": "NoPreprocessing",
"name": "NoPreprocessing",
"handles_regression": True,
"handles_classification": True,
"handles_multiclass": True,
"handles_multilabel": True,
"handles_multioutput": True,
"is_deterministic": True,
"input": (SPARSE, DENSE, UNSIGNED_DATA),
"output": (INPUT,),
}
@staticmethod
def get_hyperparameter_search_space(dataset_properties=None):
return ConfigurationSpace() # Return an empty configuration as there is None
# Add NoPreprocessing component to auto-sklearn.
autosklearn.pipeline.components.data_preprocessing.add_preprocessor(NoPreprocessing)
if __name__ == "__main__":
clf = AutoSklearnClassifier(
time_left_for_this_task=120,
include={
'data_preprocessor': ['NoPreprocessing']
},
# Bellow two flags are provided to speed up calculations
# Not recommended for a real implementation
initial_configurations_via_metalearning=0,
smac_scenario_args={'runcount_limit': 5},
n_jobs=7
)
clf.fit(X, y)
print(clf.sprint_statistics()) Output: [WARNING] [2021-11-03 11:30:21,982:Client-AutoML(1):0bb3609f-3c91-11ec-a202-ec7949506548] Capping the per_run_time_limit to 59.0 to have time for a least 2 models in each process.
auto-sklearn results:
Dataset name: 0bb3609f-3c91-11ec-a202-ec7949506548
Metric: accuracy
Best validation score: 0.957447
Number of target algorithm runs: 5
Number of successful target algorithm runs: 4
Number of crashed target algorithm runs: 1
Number of target algorithms that exceeded the time limit: 0
Number of target algorithms that exceeded the memory limit: 0 |
This PR fixes a broken example on extending autosklearn with a
NoPreprocessing
step for data preprocessing.It also updates the docs to distinguish this.
Closes #1257