Does output with R2 below simple multiple regression indicate error or tuning need?

This was the script that resulted in an unusually low R2 -- the expected result was higher than multiple regression (.55) instead of R2 of .12. The question is whether this indicates a poor use case for auto-SKLearn, a need for parameter or hyperparameter adjustments, or some other error in use?

15:11:48 PRIVATE python3 eluellen-sklearn.py
/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.metrics.classification module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.
  warnings.warn(message, FutureWarning)
Samples = 2619, Features = 40
X_train = [[4.96200e+04 1.71090e+04 3.44800e-01 ... 1.70000e+04 3.26200e+04
  6.57400e-01]
 [5.95000e+04 0.00000e+00 0.00000e+00 ... 5.95000e+04 0.00000e+00
  0.00000e+00]
 [4.65400e+04 4.65400e+04 1.00000e+00 ... 1.15400e+04 3.50000e+04
  7.52000e-01]
 ...
 [5.25800e+04 3.14100e+04 5.97400e-01 ... 2.25800e+04 3.00000e+04
  5.70600e-01]
 [6.46150e+04 6.27120e+04 9.70500e-01 ... 9.61500e+03 5.50000e+04
  8.51200e-01]
 [5.25800e+04 2.90390e+04 5.52300e-01 ... 2.22230e+04 3.03575e+04
  5.77400e-01]], y_train = [1. 0. 1. ... 1. 1. 1.]
/usr/local/lib/python3.6/dist-packages/sklearn/base.py:197: FutureWarning: From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.
  FutureWarning)
[WARNING] [2020-02-18 15:11:58,185:AutoMLSMBO(1)::cb28bbd020a0a08a3c17168f19c8aaae] Could not find meta-data directory /usr/local/lib/python3.6/dist-packages/autosklearn/metalearning/files/r2_regression_dense
[WARNING] [2020-02-18 15:11:58,212:EnsembleBuilder(1):cb28bbd020a0a08a3c17168f19c8aaae] No models better than random - using Dummy Score!
[WARNING] [2020-02-18 15:11:58,224:EnsembleBuilder(1):cb28bbd020a0a08a3c17168f19c8aaae] No models better than random - using Dummy Score!
[WARNING] [2020-02-18 15:12:00,228:EnsembleBuilder(1):cb28bbd020a0a08a3c17168f19c8aaae] No models better than random - using Dummy Score!
[(0.340000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'one_hot_encoding', 'imputation:strategy': 'median', 'preprocessor:__choice__': 'extra_trees_preproc_for_regression', 'regressor:__choice__': 'ridge_regression', 'rescaling:__choice__': 'quantile_transformer', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:extra_trees_preproc_for_regression:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_regression:criterion': 'mae', 'preprocessor:extra_trees_preproc_for_regression:max_depth': 'None', 'preprocessor:extra_trees_preproc_for_regression:max_features': 0.8215479502881777, 'preprocessor:extra_trees_preproc_for_regression:max_leaf_nodes': 'None', 'preprocessor:extra_trees_preproc_for_regression:min_samples_leaf': 11, 'preprocessor:extra_trees_preproc_for_regression:min_samples_split': 9, 'preprocessor:extra_trees_preproc_for_regression:min_weight_fraction_leaf': 0.0, 'preprocessor:extra_trees_preproc_for_regression:n_estimators': 100, 'regressor:ridge_regression:alpha': 4.563743442447699, 'regressor:ridge_regression:fit_intercept': 'True', 'regressor:ridge_regression:tol': 4.8339309027613326e-05, 'rescaling:quantile_transformer:n_quantiles': 572, 'rescaling:quantile_transformer:output_distribution': 'uniform', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.022216999044307732},
dataset_properties={
  'task': 4,
  'sparse': False,
  'multilabel': False,
  'multiclass': False,
  'target_type': 'regression',
  'signed': False})),
(0.340000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'one_hot_encoding', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'fast_ica', 'regressor:__choice__': 'extra_trees', 'rescaling:__choice__': 'minmax', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'False', 'preprocessor:fast_ica:algorithm': 'parallel', 'preprocessor:fast_ica:fun': 'logcosh', 'preprocessor:fast_ica:whiten': 'False', 'regressor:extra_trees:bootstrap': 'False', 'regressor:extra_trees:criterion': 'friedman_mse', 'regressor:extra_trees:max_depth': 'None', 'regressor:extra_trees:max_features': 0.343851332296278, 'regressor:extra_trees:max_leaf_nodes': 'None', 'regressor:extra_trees:min_impurity_decrease': 0.0, 'regressor:extra_trees:min_samples_leaf': 14, 'regressor:extra_trees:min_samples_split': 5, 'regressor:extra_trees:n_estimators': 100},
dataset_properties={
  'task': 4,
  'sparse': False,
  'multilabel': False,
  'multiclass': False,
  'target_type': 'regression',
  'signed': False})),
(0.260000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'one_hot_encoding', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'no_preprocessing', 'regressor:__choice__': 'random_forest', 'rescaling:__choice__': 'standardize', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'regressor:random_forest:bootstrap': 'True', 'regressor:random_forest:criterion': 'mse', 'regressor:random_forest:max_depth': 'None', 'regressor:random_forest:max_features': 1.0, 'regressor:random_forest:max_leaf_nodes': 'None', 'regressor:random_forest:min_impurity_decrease': 0.0, 'regressor:random_forest:min_samples_leaf': 1, 'regressor:random_forest:min_samples_split': 2, 'regressor:random_forest:min_weight_fraction_leaf': 0.0, 'regressor:random_forest:n_estimators': 100, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.01},
dataset_properties={
  'task': 4,
  'sparse': False,
  'multilabel': False,
  'multiclass': False,
  'target_type': 'regression',
  'signed': False})),
(0.040000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'one_hot_encoding', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'fast_ica', 'regressor:__choice__': 'ridge_regression', 'rescaling:__choice__': 'standardize', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:fast_ica:algorithm': 'deflation', 'preprocessor:fast_ica:fun': 'exp', 'preprocessor:fast_ica:whiten': 'True', 'regressor:ridge_regression:alpha': 1.3608642297867532e-05, 'regressor:ridge_regression:fit_intercept': 'True', 'regressor:ridge_regression:tol': 0.002596874543719601, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.00017348437847697216, 'preprocessor:fast_ica:n_components': 1058},
dataset_properties={
  'task': 4,
  'sparse': False,
  'multilabel': False,
  'multiclass': False,
  'target_type': 'regression',
  'signed': False})),
(0.020000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'no_encoding', 'imputation:strategy': 'median', 'preprocessor:__choice__': 'select_percentile_regression', 'regressor:__choice__': 'ridge_regression', 'rescaling:__choice__': 'quantile_transformer', 'preprocessor:select_percentile_regression:percentile': 82.56436225708288, 'preprocessor:select_percentile_regression:score_func': 'mutual_info', 'regressor:ridge_regression:alpha': 1.6259354959848533, 'regressor:ridge_regression:fit_intercept': 'True', 'regressor:ridge_regression:tol': 0.005858793476627702, 'rescaling:quantile_transformer:n_quantiles': 431, 'rescaling:quantile_transformer:output_distribution': 'normal'},
dataset_properties={
  'task': 4,
  'sparse': False,
  'multilabel': False,
  'multiclass': False,
  'target_type': 'regression',
  'signed': False})),
]
R2 score: 0.12086525801756198

real    1m58.008s
user    2m17.253s
sys     0m12.919s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does output with R2 below simple multiple regression indicate error or tuning need? #784

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does output with R2 below simple multiple regression indicate error or tuning need? #784

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions