Skip to content

Crash under specific input example #1186

@PGijsbers

Description

@PGijsbers

Describe the bug

I was trying to create a minimal working example for an issue we have on real data (KDDCup).
Along the way I found this (different) error raised when producing predictions.

I'm fine with a won't fix but I figured I would share so you can see if it has a more serious underlying issue.

To Reproduce

Installed from development branch.

import numpy as np
from autosklearn.experimental.askl2 import AutoSklearn2Classifier

x = np.random.random(size=(150, 4))
y = np.asarray([1]*75 + [2]*74 + [3])

aml = AutoSklearn2Classifier(time_left_for_this_task=60)
aml.fit(x, y)
predictions = aml.predict(x)

The single sample for class 3 seems rather crucial, I tried other configurations but they would not produce the error.

Expected behavior

Predictions to be produced.

Actual behavior, stacktrace or logfile

(venv) root@486c0ae472af:/bench# python mwe.py
/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/smac/intensification/parallel_scheduling.py:152: UserWarning: SuccessiveHalving is intended to be used with more than 1 worker but num_workers=1
  num_workers
[WARNING] [2021-07-27 15:07:04,115:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 1. Number of dummy models: 1
Traceback (most recent call last):
  File "mwe.py", line 9, in <module>
    predictions = aml.predict(x)
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/estimators.py", line 695, in predict
    return super().predict(X, batch_size=batch_size, n_jobs=n_jobs)
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/estimators.py", line 494, in predict
    return self.automl_.predict(X, batch_size=batch_size, n_jobs=n_jobs)
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/automl.py", line 1703, in predict
    n_jobs=n_jobs)
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/automl.py", line 1230, in predict
    for identifier in self.ensemble_.get_selected_model_identifiers()
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/automl.py", line 96, in _model_predict
    prediction = model.predict_proba(X_)
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/ensemble/_voting.py", line 329, in _predict_proba
    avg = np.average(self._collect_probas(X), axis=0,
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/ensemble/_voting.py", line 324, in _collect_probas
    return np.asarray([clf.predict_proba(X) for clf in self.estimators_])
ValueError: could not broadcast input array from shape (150,3) into shape (150,)

Environment and installation:

Please give details about your installation:

  • OS: Debian 10 in docker hosted by Windows 10
  • virtual environment
  • Python version: 3.7.11
  • Auto-sklearn version: development (11afae22b8c9a6309d2b6fcf7cfb9a947711cd1e)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions