Removal of competition manager #869

franchuterivera · 2020-06-01T09:45:25Z

This pull request removes the competition manager. Its current usage is:

In automl for fit automl dataset
In smbo for loading data

So I removed the fit automl dataset function, aligning to only use the xy datamanager

In smbo, the load function would have become a call to backed load data. I removed the load_data function, which was a straight forward change except on the test side. In here I movedto loading standart datasets using sklearn functions.

autosklearn/automl.py

mfeurer · 2020-06-09T09:14:51Z

autosklearn/estimators.py

        self.logging_config = logging_config
        self.metadata_directory = metadata_directory

-        self._automl = None  # type: Optional[List[BaseAutoML]]


Hm, what's the reason to drop this? Was it incorrect?

I removed the comment here. The logic behind this was:

There was an unnecessary import here to BaseAutoML. I thought it was a legacy due to the comment:
self._automl = None # type: Optional[List[BaseAutoML]

Here, the automl attribute defaults to None. The comment doesn't make sense? Please let me know if you still want to keep that comment!

Hey, that's a type hint: https://docs.python.org/3/library/typing.html

We don't use them as much as we should yet, but I'd like to keep it.

setup.py

test/test_evaluation/evaluation_util.py

mfeurer · 2020-06-09T09:22:53Z

test/test_evaluation/evaluation_util.py

-    D = CompetitionDataManager(dataset_path)
+    # https://www.openml.org/d/183
+    dataset_name = 'abalone'
+    task = 'multiclass.classification'


Shouldn' this be a constant from here?

Did you have a look at this?

test/test_automl/test_automl.py

mfeurer · 2020-06-12T14:33:20Z

autosklearn/estimators.py

        self.logging_config = logging_config
        self.metadata_directory = metadata_directory

-        self._automl = None  # type: Optional[List[BaseAutoML]]


Hey, that's a type hint: https://docs.python.org/3/library/typing.html

We don't use them as much as we should yet, but I'd like to keep it.

mfeurer · 2020-06-12T14:37:07Z

autosklearn/automl.py

        self._resampling_strategy = resampling_strategy
        self._resampling_strategy_arguments = resampling_strategy_arguments \
            if resampling_strategy_arguments is not None else {}
+        if self._resampling_strategy not in ['holdout',


These were so far part of the fit method, is there a reason you moved them to init?

If they are provided in the estimator definition, shouldn't the check be done there?
Because if someone defines an incorrect resampling strategy when doing the estimator, he will find out during fit, but then, he will have to redefine the estimator?

Hm, true, but then there are a lot of other checks within fit that you did not move. Could you please move them for consistency?

mfeurer · 2020-06-12T14:38:12Z

Not sure if this PR was ready for review, but could you please have a look at the unit tests, too?

franchuterivera · 2020-06-12T15:17:33Z

Just one last clarification regarding the typing. As it is, it causes a imported but unused flake error, and that is the reason I originally deleted it.

It currently is like:
self._automl = None # type: Optional[List[BaseAutoML]]

Is it ok to add an automl argument to the constructor, properly define the typing there and default it to none?

codecov-commenter · 2020-06-13T08:35:34Z

Codecov Report

Merging #869 into development will increase coverage by 0.48%.
The diff coverage is 81.48%.

@@               Coverage Diff               @@
##           development     #869      +/-   ##
===============================================
+ Coverage        84.06%   84.55%   +0.48%     
===============================================
  Files              127      126       -1     
  Lines             9435     9218     -217     
===============================================
- Hits              7932     7794     -138     
+ Misses            1503     1424      -79

Impacted Files	Coverage Δ
autosklearn/automl.py	`81.93% <80.39%> (+0.17%)`	⬆️
autosklearn/estimators.py	`90.41% <100.00%> (+0.05%)`	⬆️
autosklearn/smbo.py	`72.72% <100.00%> (-0.70%)`	⬇️
autosklearn/data/abstract_data_manager.py	`77.02% <0.00%> (-12.17%)`	⬇️
autosklearn/evaluation/__init__.py	`80.54% <0.00%> (-2.17%)`	⬇️
autosklearn/ensemble_builder.py	`69.89% <0.00%> (-1.19%)`	⬇️
autosklearn/evaluation/train_evaluator.py	`72.62% <0.00%> (+0.45%)`	⬆️
...eline/components/feature_preprocessing/fast_ica.py	`97.82% <0.00%> (+6.52%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d313f26...b460564. Read the comment docs.

scripts/run_auto-sklearn_for_metadata_generation.py

mfeurer · 2020-06-13T09:22:29Z

test/test_evaluation/evaluation_util.py

-    D = CompetitionDataManager(dataset_path)
+    # https://www.openml.org/d/183
+    dataset_name = 'abalone'
+    task = 'multiclass.classification'


Did you have a look at this?

mfeurer · 2020-06-13T09:24:55Z

Is it ok to add an automl argument to the constructor, properly define the typing there and default it to none?
Yes, totally

mfeurer

Hey, this looks really good, but I think there's at least one more necessary change.

* Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics

* PEP8 (#718) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * #782 showcase pipeline components iteration * Fixed flake-8 violations * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * automl.py missing import * Release note 070 (#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (#854) * More robust tmp file naming * UUID approach * 771 worst possible result (#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (#865) * #715 Support for no ml memory limit * API update * Docs enhancement (#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (#866) * Do not read predictions in memory, only after score (#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (#879) * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * #782 showcase pipeline components iteration * Fixed flake-8 violations * Release note 070 (#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (#854) * More robust tmp file naming * UUID approach * 771 worst possible result (#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (#865) * #715 Support for no ml memory limit * API update * Docs enhancement (#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (#866) * Do not read predictions in memory, only after score (#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (#879) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * Add prediction with models trained with cross-validation (#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * multioutput_regression * multioutput_regression * multioutput_regression * Removal of competition manager (#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * multioutput after rebased to 0.7.0 Problem: Cause: Solution: * Regressor target y shape index out of range * Revision for make tester * Revision: Cancel Multiclass-MultiOuput * Resolve automl.py metrics(__init__) reg_gb reg_svm * Fix Flake8 errors * Fix automl.py flake8 * Preprocess w/ mulitout reg,automl self._n_outputs * test_estimator.py changed back * cancel multioutput multiclass for multi reg * Fix automl self._n_output update placement * fix flake8 * Kernel pca cancelled mulitout reg * Kernel PCA test skip python <3.8 * Add test unit for multioutput reg and fix. * Fix flake8 error * Kernel PCA multioutput regression * default kernel to cosine, dodge sklearn=0.22 error * Kernel PCA should be updated to 0.23 * Kernel PCA uses rbf kernel * Kernel Pca * Modify labels in reg, class, perpro in examples * Kernel PCA * Add missing supports to mincoal and truncateSVD Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: chico <[email protected]> Co-authored-by: Francisco Rivera Valverde <[email protected]> Co-authored-by: Xiaodong DENG <[email protected]>

* Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics

* PEP8 (automl#718) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * automl.py missing import * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * multioutput_regression * multioutput_regression * multioutput_regression * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * multioutput after rebased to 0.7.0 Problem: Cause: Solution: * Regressor target y shape index out of range * Revision for make tester * Revision: Cancel Multiclass-MultiOuput * Resolve automl.py metrics(__init__) reg_gb reg_svm * Fix Flake8 errors * Fix automl.py flake8 * Preprocess w/ mulitout reg,automl self._n_outputs * test_estimator.py changed back * cancel multioutput multiclass for multi reg * Fix automl self._n_output update placement * fix flake8 * Kernel pca cancelled mulitout reg * Kernel PCA test skip python <3.8 * Add test unit for multioutput reg and fix. * Fix flake8 error * Kernel PCA multioutput regression * default kernel to cosine, dodge sklearn=0.22 error * Kernel PCA should be updated to 0.23 * Kernel PCA uses rbf kernel * Kernel Pca * Modify labels in reg, class, perpro in examples * Kernel PCA * Add missing supports to mincoal and truncateSVD Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: chico <[email protected]> Co-authored-by: Francisco Rivera Valverde <[email protected]> Co-authored-by: Xiaodong DENG <[email protected]>

mfeurer reviewed Jun 9, 2020

View reviewed changes

mfeurer reviewed Jun 12, 2020

View reviewed changes

mfeurer reviewed Jun 13, 2020

View reviewed changes

mfeurer requested changes Jun 13, 2020

View reviewed changes

franchuterivera added 7 commits June 13, 2020 12:09

Removal of competition manager

e0d50b4

Removed additional unused methods/files and moved metrics to estimator

ea55e12

Fix meta data generation

a723b39

Make sure pytest is older newer than 4.6

bc9a508

Unit tst fixing

7636e7c

flake8 fixes in examples

9dcb3c8

Fix metadata gen metrics

b460564

franchuterivera force-pushed the remove_competition_loader branch from 5bb86d2 to b460564 Compare June 13, 2020 10:12

mfeurer approved these changes Jun 15, 2020

View reviewed changes

mfeurer merged commit 47a3f12 into automl:development Jun 15, 2020

Removal of competition manager #869

Removal of competition manager #869

Uh oh!

Conversation

franchuterivera commented Jun 1, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfeurer commented Jun 12, 2020

Uh oh!

franchuterivera commented Jun 12, 2020

Uh oh!

codecov-commenter commented Jun 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfeurer commented Jun 13, 2020

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jun 13, 2020 •

edited

Loading