Docs enhancement #862

franchuterivera · 2020-05-27T16:42:15Z

This PR enhances the documentation of autosklearn via:

Rendering the examples in the web page: Aligned travis configuration to what openml has, fixed some docs configurations that where outdated and re-arenged examples so that they could be rendered via rest code blocks (else output is shown before code).
Improved ML memory limit via example addition and API doc
Updated installation documentation due to Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) #856

mfeurer · 2020-05-28T08:41:36Z

doc/manual.rst

+*auto-sklearn* uses ensemble selection by `Caruana et al. (2004) <https://dl.acm.org/doi/pdf/10.1145/1015330.1015432>`_ 
+to build an ensemble based on the models’ prediction for the validation set. The following hyperparameters control how the ensemble is constructed:
+
+* ``ensemble_size`` determines the maximal size of the ensemble. if it is set to zero, no ensemble will be constructed.


if should be upper case.

mfeurer · 2020-05-28T08:42:42Z

doc/manual.rst

+to build an ensemble based on the models’ prediction for the validation set. The following hyperparameters control how the ensemble is constructed:
+
+* ``ensemble_size`` determines the maximal size of the ensemble. if it is set to zero, no ensemble will be constructed.
+* ``ensemble_nbest`` allows the user to directly specify the number of models used for the ensemble.  This hyperparameter can be an integer *n*, such that only the best *n* models are used in the final ensemble. If a float between 0.0 and 1.0 is provided, ``ensemble_nbest`` would be interpreted as a fraction suggesting the percentage of models to use in the ensemble building process.


I wonder if one should write 'number of models considered for the ensemble' instead of 'used for the ensemble' as there is no guarantee that they end up in the final ensemble.

Also, could you please add that if ensemble_nbest is a float, this is library pruning as described in https://dl.acm.org/doi/10.1109/ICDM.2006.76

mfeurer · 2020-05-28T08:44:20Z

doc/manual.rst

 algorithm runs.
-The results obtained from the final ensemble can be printed by calling ``show_models()``.
+
+The results obtained from the final ensemble can be printed by calling ``show_models()``. *auto-sklearn* ensemble is composed of scikit-learn that can be inspected as exemplified by  


Shouldn't there be a 'models' after scikit-learn?

mfeurer · 2020-05-28T08:44:48Z

examples/example_crossvalidation.py

-    main()
+############################################################################
+# Data Loading
+# ======================================


The underline should only be as long as the heading.

mfeurer · 2020-05-28T08:49:16Z

examples/example_crossvalidation.py

+
+############################################################################
+# Building  and fitting the classifier
+# ======================================


Same here and everywhere. I don't know why the rst format demands that...

mfeurer · 2020-05-28T08:49:40Z

examples/example_crossvalidation.py

+    per_run_time_limit=30,
+    tmp_folder='/tmp/autosklearn_cv_example_tmp',
+    output_folder='/tmp/autosklearn_cv_example_out',
+    delete_tmp_folder_after_terminate=False,


Could you please delete this line? It is actually not necessary.

mfeurer · 2020-05-28T08:52:01Z

examples/example_feature_types.py

+# Data Loading
+# ======================================
+# Load adult dataset from openml.org, see https://www.openml.org/t/2117
+openml.config.apikey = '610344db6388d9ba34f6db45a3cf71de'


Actually, this is no longer necessary and we can use sklearn's fetch_openml function for this https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html

Could you please replace the data loading with the sklearn function?

mfeurer · 2020-05-28T08:53:24Z

examples/example_parallel_manual_spawning.py



+############################################################################
+# Define the spawner


Could you please name this header 'Define utility function for multiprocessing'?

mfeurer · 2020-05-28T08:54:19Z

examples/example_successive_halving.py



+############################################################################
+# Define a small callback that instantiates SuccessiveHalving


Could you please drop the 'small'?

mfeurer · 2020-05-28T08:58:08Z

Actually, could you please try that the examples are executed only once? Currently, they are executed as part of the 1st and 4th entry of the unit test test matrix. This should only be a matter of moving the docpush to the first entry of the test matrix.

franchuterivera · 2020-05-28T12:32:01Z

I had to to a rebase, due to some conflicts. The doc push runs the examples, so I removed the examples flag on the unit test. I will verify such is the case when travis is completed.

codecov-commenter · 2020-05-28T13:00:14Z

Codecov Report

Merging #862 into development will decrease coverage by 0.08%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           development     #862      +/-   ##
===============================================
- Coverage        84.02%   83.93%   -0.09%     
===============================================
  Files              127      127              
  Lines             9458     9458              
===============================================
- Hits              7947     7939       -8     
- Misses            1511     1519       +8

Impacted Files	Coverage Δ
autosklearn/estimators.py	`90.36% <ø> (ø)`
..._preprocessing/select_percentile_classification.py	`82.75% <0.00%> (-6.90%)`	⬇️
...mponents/feature_preprocessing/nystroem_sampler.py	`85.29% <0.00%> (-5.89%)`	⬇️
...osklearn/pipeline/components/classification/lda.py	`89.47% <0.00%> (-5.27%)`	⬇️
...eline/components/feature_preprocessing/fast_ica.py	`91.30% <0.00%> (+2.17%)`	⬆️
...e/components/feature_preprocessing/select_rates.py	`84.61% <0.00%> (+3.07%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87a3f95...d6a66de. Read the comment docs.

examples/example_successive_halving.py

franchuterivera · 2020-05-28T14:43:58Z

I missed a lot, sorry. I did it in vim, and for some reason my buffer missed the some files. I just did it again and verified file by file.

* Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal

* PEP8 (#718) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * #782 showcase pipeline components iteration * Fixed flake-8 violations * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * automl.py missing import * Release note 070 (#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (#854) * More robust tmp file naming * UUID approach * 771 worst possible result (#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (#865) * #715 Support for no ml memory limit * API update * Docs enhancement (#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (#866) * Do not read predictions in memory, only after score (#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (#879) * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * #782 showcase pipeline components iteration * Fixed flake-8 violations * Release note 070 (#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (#854) * More robust tmp file naming * UUID approach * 771 worst possible result (#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (#865) * #715 Support for no ml memory limit * API update * Docs enhancement (#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (#866) * Do not read predictions in memory, only after score (#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (#879) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * Add prediction with models trained with cross-validation (#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * multioutput_regression * multioutput_regression * multioutput_regression * Removal of competition manager (#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * multioutput after rebased to 0.7.0 Problem: Cause: Solution: * Regressor target y shape index out of range * Revision for make tester * Revision: Cancel Multiclass-MultiOuput * Resolve automl.py metrics(__init__) reg_gb reg_svm * Fix Flake8 errors * Fix automl.py flake8 * Preprocess w/ mulitout reg,automl self._n_outputs * test_estimator.py changed back * cancel multioutput multiclass for multi reg * Fix automl self._n_output update placement * fix flake8 * Kernel pca cancelled mulitout reg * Kernel PCA test skip python <3.8 * Add test unit for multioutput reg and fix. * Fix flake8 error * Kernel PCA multioutput regression * default kernel to cosine, dodge sklearn=0.22 error * Kernel PCA should be updated to 0.23 * Kernel PCA uses rbf kernel * Kernel Pca * Modify labels in reg, class, perpro in examples * Kernel PCA * Add missing supports to mincoal and truncateSVD Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: chico <[email protected]> Co-authored-by: Francisco Rivera Valverde <[email protected]> Co-authored-by: Xiaodong DENG <[email protected]>

* Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal

* PEP8 (automl#718) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * automl.py missing import * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * multioutput_regression * multioutput_regression * multioutput_regression * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * multioutput after rebased to 0.7.0 Problem: Cause: Solution: * Regressor target y shape index out of range * Revision for make tester * Revision: Cancel Multiclass-MultiOuput * Resolve automl.py metrics(__init__) reg_gb reg_svm * Fix Flake8 errors * Fix automl.py flake8 * Preprocess w/ mulitout reg,automl self._n_outputs * test_estimator.py changed back * cancel multioutput multiclass for multi reg * Fix automl self._n_output update placement * fix flake8 * Kernel pca cancelled mulitout reg * Kernel PCA test skip python <3.8 * Add test unit for multioutput reg and fix. * Fix flake8 error * Kernel PCA multioutput regression * default kernel to cosine, dodge sklearn=0.22 error * Kernel PCA should be updated to 0.23 * Kernel PCA uses rbf kernel * Kernel Pca * Modify labels in reg, class, perpro in examples * Kernel PCA * Add missing supports to mincoal and truncateSVD Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: chico <[email protected]> Co-authored-by: Francisco Rivera Valverde <[email protected]> Co-authored-by: Xiaodong DENG <[email protected]>

mfeurer reviewed May 28, 2020

View reviewed changes

franchuterivera force-pushed the docs_enhancement branch from bc9f454 to 17f0adc Compare May 28, 2020 12:27

mfeurer reviewed May 28, 2020

View reviewed changes

examples/example_successive_halving.py Outdated Show resolved Hide resolved

franchuterivera added 5 commits May 28, 2020 16:47

Improved docs

dd6b739

Fixed example typos

4db2c18

Beautify examples

dba491b

cleanup examples

b10c06b

fixed rsa equal

d6a66de

franchuterivera force-pushed the docs_enhancement branch from 634aef6 to d6a66de Compare May 28, 2020 14:48

mfeurer approved these changes May 28, 2020

View reviewed changes

mfeurer merged commit 79f5839 into automl:development May 28, 2020

mfeurer mentioned this pull request Jun 9, 2020

Improve documentation of the ml_memory_limit #838

Closed

mfeurer mentioned this pull request Jun 17, 2020

Introductory example scripts with zero running time #731

Closed

charlesfu4 pushed a commit to charlesfu4/auto-sklearn that referenced this pull request Jun 17, 2020

Docs enhancement (automl#862)

d2a5733

* Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal

franchuterivera added a commit to franchuterivera/auto-sklearn that referenced this pull request Aug 21, 2020

Docs enhancement (automl#862)

8261533

* Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal



		############################################################################
		# Define the spawner



		############################################################################
		# Define a small callback that instantiates SuccessiveHalving

Docs enhancement #862

Docs enhancement #862

Uh oh!

Conversation

franchuterivera commented May 27, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfeurer commented May 28, 2020

Uh oh!

franchuterivera commented May 28, 2020

Uh oh!

codecov-commenter commented May 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

franchuterivera commented May 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented May 28, 2020 •

edited

Loading