Skip to content

Does output with R2 below simple multiple regression indicate error or tuning need? #784

@EricLuellen

Description

@EricLuellen

This was the script that resulted in an unusually low R2 -- the expected result was higher than multiple regression (.55) instead of R2 of .12. The question is whether this indicates a poor use case for auto-SKLearn, a need for parameter or hyperparameter adjustments, or some other error in use?

15:11:48 PRIVATE python3 eluellen-sklearn.py
/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.metrics.classification module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.
warnings.warn(message, FutureWarning)
Samples = 2619, Features = 40
X_train = [[4.96200e+04 1.71090e+04 3.44800e-01 ... 1.70000e+04 3.26200e+04
6.57400e-01]
[5.95000e+04 0.00000e+00 0.00000e+00 ... 5.95000e+04 0.00000e+00
0.00000e+00]
[4.65400e+04 4.65400e+04 1.00000e+00 ... 1.15400e+04 3.50000e+04
7.52000e-01]
...
[5.25800e+04 3.14100e+04 5.97400e-01 ... 2.25800e+04 3.00000e+04
5.70600e-01]
[6.46150e+04 6.27120e+04 9.70500e-01 ... 9.61500e+03 5.50000e+04
8.51200e-01]
[5.25800e+04 2.90390e+04 5.52300e-01 ... 2.22230e+04 3.03575e+04
5.77400e-01]], y_train = [1. 0. 1. ... 1. 1. 1.]
/usr/local/lib/python3.6/dist-packages/sklearn/base.py:197: FutureWarning: From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.
FutureWarning)
[WARNING] [2020-02-18 15:11:58,185:AutoMLSMBO(1)::cb28bbd020a0a08a3c17168f19c8aaae] Could not find meta-data directory /usr/local/lib/python3.6/dist-packages/autosklearn/metalearning/files/r2_regression_dense
[WARNING] [2020-02-18 15:11:58,212:EnsembleBuilder(1):cb28bbd020a0a08a3c17168f19c8aaae] No models better than random - using Dummy Score!
[WARNING] [2020-02-18 15:11:58,224:EnsembleBuilder(1):cb28bbd020a0a08a3c17168f19c8aaae] No models better than random - using Dummy Score!
[WARNING] [2020-02-18 15:12:00,228:EnsembleBuilder(1):cb28bbd020a0a08a3c17168f19c8aaae] No models better than random - using Dummy Score!
[(0.340000, SimpleRegressionPipeline({'categorical_encoding:choice': 'one_hot_encoding', 'imputation:strategy': 'median', 'preprocessor:choice': 'extra_trees_preproc_for_regression', 'regressor:choice': 'ridge_regression', 'rescaling:choice': 'quantile_transformer', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:extra_trees_preproc_for_regression:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_regression:criterion': 'mae', 'preprocessor:extra_trees_preproc_for_regression:max_depth': 'None', 'preprocessor:extra_trees_preproc_for_regression:max_features': 0.8215479502881777, 'preprocessor:extra_trees_preproc_for_regression:max_leaf_nodes': 'None', 'preprocessor:extra_trees_preproc_for_regression:min_samples_leaf': 11, 'preprocessor:extra_trees_preproc_for_regression:min_samples_split': 9, 'preprocessor:extra_trees_preproc_for_regression:min_weight_fraction_leaf': 0.0, 'preprocessor:extra_trees_preproc_for_regression:n_estimators': 100, 'regressor:ridge_regression:alpha': 4.563743442447699, 'regressor:ridge_regression:fit_intercept': 'True', 'regressor:ridge_regression:tol': 4.8339309027613326e-05, 'rescaling:quantile_transformer:n_quantiles': 572, 'rescaling:quantile_transformer:output_distribution': 'uniform', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.022216999044307732},
dataset_properties={
'task': 4,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'regression',
'signed': False})),
(0.340000, SimpleRegressionPipeline({'categorical_encoding:choice': 'one_hot_encoding', 'imputation:strategy': 'most_frequent', 'preprocessor:choice': 'fast_ica', 'regressor:choice': 'extra_trees', 'rescaling:choice': 'minmax', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'False', 'preprocessor:fast_ica:algorithm': 'parallel', 'preprocessor:fast_ica:fun': 'logcosh', 'preprocessor:fast_ica:whiten': 'False', 'regressor:extra_trees:bootstrap': 'False', 'regressor:extra_trees:criterion': 'friedman_mse', 'regressor:extra_trees:max_depth': 'None', 'regressor:extra_trees:max_features': 0.343851332296278, 'regressor:extra_trees:max_leaf_nodes': 'None', 'regressor:extra_trees:min_impurity_decrease': 0.0, 'regressor:extra_trees:min_samples_leaf': 14, 'regressor:extra_trees:min_samples_split': 5, 'regressor:extra_trees:n_estimators': 100},
dataset_properties={
'task': 4,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'regression',
'signed': False})),
(0.260000, SimpleRegressionPipeline({'categorical_encoding:choice': 'one_hot_encoding', 'imputation:strategy': 'mean', 'preprocessor:choice': 'no_preprocessing', 'regressor:choice': 'random_forest', 'rescaling:choice': 'standardize', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'regressor:random_forest:bootstrap': 'True', 'regressor:random_forest:criterion': 'mse', 'regressor:random_forest:max_depth': 'None', 'regressor:random_forest:max_features': 1.0, 'regressor:random_forest:max_leaf_nodes': 'None', 'regressor:random_forest:min_impurity_decrease': 0.0, 'regressor:random_forest:min_samples_leaf': 1, 'regressor:random_forest:min_samples_split': 2, 'regressor:random_forest:min_weight_fraction_leaf': 0.0, 'regressor:random_forest:n_estimators': 100, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.01},
dataset_properties={
'task': 4,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'regression',
'signed': False})),
(0.040000, SimpleRegressionPipeline({'categorical_encoding:choice': 'one_hot_encoding', 'imputation:strategy': 'most_frequent', 'preprocessor:choice': 'fast_ica', 'regressor:choice': 'ridge_regression', 'rescaling:choice': 'standardize', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:fast_ica:algorithm': 'deflation', 'preprocessor:fast_ica:fun': 'exp', 'preprocessor:fast_ica:whiten': 'True', 'regressor:ridge_regression:alpha': 1.3608642297867532e-05, 'regressor:ridge_regression:fit_intercept': 'True', 'regressor:ridge_regression:tol': 0.002596874543719601, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.00017348437847697216, 'preprocessor:fast_ica:n_components': 1058},
dataset_properties={
'task': 4,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'regression',
'signed': False})),
(0.020000, SimpleRegressionPipeline({'categorical_encoding:choice': 'no_encoding', 'imputation:strategy': 'median', 'preprocessor:choice': 'select_percentile_regression', 'regressor:choice': 'ridge_regression', 'rescaling:choice': 'quantile_transformer', 'preprocessor:select_percentile_regression:percentile': 82.56436225708288, 'preprocessor:select_percentile_regression:score_func': 'mutual_info', 'regressor:ridge_regression:alpha': 1.6259354959848533, 'regressor:ridge_regression:fit_intercept': 'True', 'regressor:ridge_regression:tol': 0.005858793476627702, 'rescaling:quantile_transformer:n_quantiles': 431, 'rescaling:quantile_transformer:output_distribution': 'normal'},
dataset_properties={
'task': 4,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'regression',
'signed': False})),
]
R2 score: 0.12086525801756198

real 1m58.008s
user 2m17.253s
sys 0m12.919s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions