Skip to content

Conversation

eddiebergman
Copy link
Contributor

@eddiebergman eddiebergman commented Sep 15, 2021

Note: Built from PR Fix sparse y test #1248, should be considered after that.
Note2: Rebasing added some other PR's into this -_-

This PR fixes some typing issues that were present, notably the self.automl_ attribute is now created during __init__() rather than on the first call to fit, removing the _build_automl() function.

This was primarily to get rid of the typing warning that self.automl_.predict: None has no attribute predict(), as mypy can not establish that self.automl_ had been set in any methods.

This PR also does some other small typing cleanups.

  • Union[SUPPORTED_TARGET_TYPES, spmatrix] -> SUPPORTED_TARGET_TYPES
  • Move self.per_run_time_limit to be set at construction
  • Explicitly marks AutoSklearnEstimator as an ABC
    • Was always functioning this way but was not marked as such
  • AutoSklearnEstimator now relies on subclasses having two @abstractmethod so that validation only has to occur once and not in each subclass.
    • _supported_target_types
    • _automl_cls
  • Removed method predict_proba from AutoSklearnEstimator, this should only be available in the subclass AutoSklearnClassifier
  • Removed AutoSklearnRegressor.fit so it just uses the base class fit method. It was doing nothing but forwarding arguments.
  • AutoSklearn2Classifer now explicitly sets attributes on the already constructed self.automl_, where previously it relied on it being constructed during fit

Main Change

# Now
class AutoSklearnEstimator(...):

    def __init__(...):
        self.automl_ = self._automl_cls(...) # Has type AutoML 
    
    def fit(...):
        ...

# Before
class AutoSklearnEstimator(...):

    def __init__(...):
         self.automl_ = None # Has type None
    
    def fit(...):
        if not self.automl_:
            self._build_automl()
        ...
        
    def _build_automl(self):
        self.automl_ = self._get_automl_cls()(...)

Extra Context

In general, as estimators are build in a two stage process, during __init__ and then remaining variables in fit, we end up with many attributes set to None during the __init__ phase. This causes problems with mypy's typing as it can not garuntee that it will not be None when the are used later.

There are two solutions to this I can think of:

  • Move any attributes we can create to __init__ from fit. In general this is probably better practice and safer.

  • Move attributes to a @property

    @property
    def attribute_wanted(self) -> T:
        if not self._attribute_wanted:
            self._attribute_wanted = ...
        return self._attribute_wanted

    This second solution works fine when we can use the combo @property attributed_wanted with self._attributed_wanted.
    It has the issue when we want @property _attribute_wanted and self.__attributed_wanted.

    While we could have the reference as __attributed_wanted, with the double leading underscore, this invokes pythons name scrambling and subclass will not be able to access this attribute directly and will be require to use the @property _attributed_wanted. This may or may not be a problem.

mdbecker and others added 9 commits November 3, 2021 14:28
* Setting `feature_preprocessors` Throws a ValueError. kwarg is `feature_preprocessor`
* Updated code samples to use ..code:: python
* Fixes failing test by setting random_state for make_x from sklearn.datsets
* Uses full path in `setup.py` instead of relative one
* Added contribution guide
* Fixes not checking for y_test being sparse
* Update example on extending data preprocessing with `no_preprocessing`
Update a link for `scenario` argument for the new SMAC documentation
* Added Citation.cff file
* Fixes old reference to HPOlibConfigSpace to ConfigSpace* Fixes
@eddiebergman
Copy link
Contributor Author

eddiebergman commented Nov 4, 2021

Need to check if calling fit twice will work.

Edit: Done, test/test_automl/test_estimators.py::test_can_fit_twice

'mlp',
]
self.include['classifier'] = include_estimators
self.automl_._include['classifier'] = include_estimators
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to not building automl_ on fit, we have to set this manually as we do for the estimator.

self.metric = log_loss

self.automl_._metric = self.metric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here

# Set the variables in the SklearnEstimator.automl
self.automl_._resampling_strategy = resampling_strategy
self.automl_._resampling_strategy_arguments = resampling_strategy_kwargs
self.automl_._get_smac_object_callback = smac_callback
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An here

@eddiebergman eddiebergman deleted the typing_fixes_in_estimators branch February 9, 2022 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants