Update example on extending data preprocessing #1269

eddiebergman · 2021-10-16T15:35:06Z

This PR fixes a broken example on extending autosklearn with a NoPreprocessing step for data preprocessing.
It also updates the docs to distinguish this.

Closes #1257

codecov · 2021-10-16T16:16:42Z

Codecov Report

Merging #1269 (eadd632) into development (502c136) will decrease coverage by 0.06%.
The diff coverage is 100.00%.

@@               Coverage Diff               @@
##           development    #1269      +/-   ##
===============================================
- Coverage        88.09%   88.02%   -0.07%     
===============================================
  Files              140      140              
  Lines            11144    11147       +3     
===============================================
- Hits              9817     9812       -5     
- Misses            1327     1335       +8

Impacted Files	Coverage Δ
...arn/pipeline/components/classification/__init__.py	`84.94% <100.00%> (+0.16%)`	⬆️
...pipeline/components/data_preprocessing/__init__.py	`82.97% <100.00%> (ø)`
...eline/components/feature_preprocessing/__init__.py	`89.33% <100.00%> (+0.14%)`	⬆️
...sklearn/pipeline/components/regression/__init__.py	`83.52% <100.00%> (+0.19%)`	⬆️
...eline/components/feature_preprocessing/fast_ica.py	`91.30% <0.00%> (-6.53%)`	⬇️
...mponents/feature_preprocessing/nystroem_sampler.py	`85.29% <0.00%> (-5.89%)`	⬇️
autosklearn/util/logging_.py	`88.96% <0.00%> (-1.38%)`	⬇️
...ine/components/classification/gradient_boosting.py	`93.04% <0.00%> (-0.87%)`	⬇️
...ipeline/components/regression/gradient_boosting.py	`93.26% <0.00%> (+1.92%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 502c136...eadd632. Read the comment docs.

mdbecker · 2021-10-29T18:48:30Z

I tried the example and it works as is, but if you change n_jobs it fails with Number of crashed target algorithm runs: 5. I assume this has something to do with Dask but don't know enough to be sure. Here is a minimum example to reproduce the failure assuming you've done everything else in the example code:

clf = AutoSklearnClassifier(
    time_left_for_this_task=120,
    include={
        'data_preprocessor': ['NoPreprocessing']
    },
    # Bellow two flags are provided to speed up calculations
    # Not recommended for a real implementation
    initial_configurations_via_metalearning=0,
    smac_scenario_args={'runcount_limit': 5},
    n_jobs=7
)
clf.fit(X_train, y_train)
print(clf.sprint_statistics())

eddiebergman · 2021-11-02T13:16:43Z

I tried the example and it works as is, but if you change n_jobs it fails with Number of crashed target algorithm runs: 5. I assume this has something to do with Dask but don't know enough to be sure. Here is a minimum example to reproduce the failure assuming you've done everything else in the example code:
clf = AutoSklearnClassifier(
    time_left_for_this_task=120,
    include={
        'data_preprocessor': ['NoPreprocessing']
    },
    # Bellow two flags are provided to speed up calculations
    # Not recommended for a real implementation
    initial_configurations_via_metalearning=0,
    smac_scenario_args={'runcount_limit': 5},
    n_jobs=7
)
clf.fit(X_train, y_train)
print(clf.sprint_statistics())

Hi @mdbecker,

Thanks for reporting this, I'll have a look into it but I'm not sure why that would be.

mfeurer · 2021-11-03T08:10:39Z

examples/80_extending/example_extending_data_preprocessor.py

    def __init__(self, **kwargs):
        """ This preprocessors does not change the data """
-        self.preprocessor = None
+        # Some internal checks makes sure parameters are set


What do you mean by some internal checks?

I'm not entirely sure where it was but there is a check somewhere in the pipeline that checks that certain attributes are set on the object from the **kwargs hence the snippet right below this

for key, val in kwargs.items(): setattr(self, key, val)

examples/80_extending/example_extending_data_preprocessor.py

eddiebergman · 2021-11-03T10:33:04Z

Hi @mdbecker,

I couldn't reproduce the failure, I've put my entire code snippet below which could hopefully help diagnose any issues.

import autosklearn
from autosklearn.classification import AutoSklearnClassifier
from autosklearn.pipeline.components.base import AutoSklearnPreprocessingAlgorithm
from sklearn.datasets import load_breast_cancer
import sklearn.metrics
from autosklearn.pipeline.constants import SPARSE, DENSE, UNSIGNED_DATA, INPUT
from sklearn.model_selection import train_test_split
from ConfigSpace.configuration_space import ConfigurationSpace

X, y = load_breast_cancer(return_X_y=True)


class NoPreprocessing(AutoSklearnPreprocessingAlgorithm):

    def __init__(self, **kwargs):
        """This preprocessors does not change the data"""
        # Some internal checks makes sure parameters are set
        for key, val in kwargs.items():
            setattr(self, key, val)

    def fit(self, X, Y=None):
        return self

    def transform(self, X):
        return X

    @staticmethod
    def get_properties(dataset_properties=None):
        return {
            "shortname": "NoPreprocessing",
            "name": "NoPreprocessing",
            "handles_regression": True,
            "handles_classification": True,
            "handles_multiclass": True,
            "handles_multilabel": True,
            "handles_multioutput": True,
            "is_deterministic": True,
            "input": (SPARSE, DENSE, UNSIGNED_DATA),
            "output": (INPUT,),
        }

    @staticmethod
    def get_hyperparameter_search_space(dataset_properties=None):
        return ConfigurationSpace()  # Return an empty configuration as there is None


# Add NoPreprocessing component to auto-sklearn.
autosklearn.pipeline.components.data_preprocessing.add_preprocessor(NoPreprocessing)


if __name__ == "__main__":

    clf = AutoSklearnClassifier(
        time_left_for_this_task=120,
        include={
            'data_preprocessor': ['NoPreprocessing']
        },
        # Bellow two flags are provided to speed up calculations
        # Not recommended for a real implementation
        initial_configurations_via_metalearning=0,
        smac_scenario_args={'runcount_limit': 5},
        n_jobs=7
    )
    clf.fit(X, y)
    print(clf.sprint_statistics())

Output:

[WARNING] [2021-11-03 11:30:21,982:Client-AutoML(1):0bb3609f-3c91-11ec-a202-ec7949506548] Capping the per_run_time_limit to 59.0 to have time for a least 2 models in each process.
auto-sklearn results:
  Dataset name: 0bb3609f-3c91-11ec-a202-ec7949506548
  Metric: accuracy
  Best validation score: 0.957447
  Number of target algorithm runs: 5
  Number of successful target algorithm runs: 4
  Number of crashed target algorithm runs: 1
  Number of target algorithms that exceeded the time limit: 0
  Number of target algorithms that exceeded the memory limit: 0

…essing

Update example

c193d29

eddiebergman linked an issue Oct 16, 2021 that may be closed by this pull request

Turning of the data preprocessing step causes algorithms to crash #1257

Closed

This was referenced Oct 16, 2021

Turning of the data preprocessing step causes algorithms to crash #1257

Closed

Update {'include': {'data_preprocessor': [...] }} #1266

Open

eddiebergman added the PR: Review Ready label Oct 16, 2021

Fix doc link

bcff657

mfeurer reviewed Nov 3, 2021

View reviewed changes

Extended comment on assertion

eadd632

Merge branch 'development' into update_example_extending_data_preproc…

43e596e

…essing

eddiebergman merged commit 89d6018 into development Nov 3, 2021

eddiebergman deleted the update_example_extending_data_preprocessing branch November 3, 2021 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update example on extending data preprocessing #1269

Update example on extending data preprocessing #1269

Uh oh!

eddiebergman commented Oct 16, 2021

Uh oh!

codecov bot commented Oct 16, 2021 •

edited

Loading

Uh oh!

mdbecker commented Oct 29, 2021

Uh oh!

eddiebergman commented Nov 2, 2021

Uh oh!

mfeurer Nov 3, 2021

Uh oh!

eddiebergman Nov 3, 2021

Uh oh!

Uh oh!

eddiebergman commented Nov 3, 2021

Uh oh!

Uh oh!

Update example on extending data preprocessing #1269

Update example on extending data preprocessing #1269

Uh oh!

Conversation

eddiebergman commented Oct 16, 2021

Uh oh!

codecov bot commented Oct 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mdbecker commented Oct 29, 2021

Uh oh!

eddiebergman commented Nov 2, 2021

Uh oh!

mfeurer Nov 3, 2021

Choose a reason for hiding this comment

Uh oh!

eddiebergman Nov 3, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eddiebergman commented Nov 3, 2021

Uh oh!

Uh oh!

codecov bot commented Oct 16, 2021 •

edited

Loading