auto-sklearn produces probability matrix inconsistent with training input

## Describe the bug ##
When the dataset has outliers and is big enough to be subsampled, it can produce a probability matrix which has fewer columns than classes in the training data.

## To Reproduce ##
```python
import numpy as np
from autosklearn.experimental.askl2 import AutoSklearn2Classifier

x = np.random.random(size=(60_000_017, 10))
y = np.asarray([1]*30_000_000 + [2]*30_000_000 + list(range(3,20)))

aml = AutoSklearn2Classifier(time_left_for_this_task=60, memory_limit=10_000)
aml.fit(x, y)
predictions = aml.predict(x)
probabilities = aml.predict_proba(x)

print(probabilities.shape)
```
> (60000017, 5)

Alternatively much slower with the automl benchmark on KDDCup:

 > python runbenchmark.py autosklearn2:latest openml/t/360112 1h8c -f 5 -m docker -s force
 
## Expected behavior ##
The number of columns in the probability matrix to match the number of classes in the training data.

> (60000017, 19)

Or alternatively a way to tell for which column belongs to which class and for which classes no predictions have been made.

## Actual behavior, stacktrace or logfile ##
```
(venv) root@486c0ae472af:/bench# python mwe.py
[WARNING] [2021-07-27 16:19:41,000:Client-AutoML(1):6d574018-eef6-11eb-9953-0242ac110004] Dataset too large for memory limit 10000MB, reducing the precision from float64 to <class 'numpy.float32'>
[WARNING] [2021-07-27 16:19:42,210:Client-AutoML(1):6d574018-eef6-11eb-9953-0242ac110004] Dataset too large for memory limit 10000MB, reducing number of samples from 60000017 to 13107200.
[WARNING] [2021-07-27 16:19:45,795:Client-AutoML(1):6d574018-eef6-11eb-9953-0242ac110004] Could not sample dataset in stratified manner, resorting to random sampling
Traceback (most recent call last):
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/automl.py", line 940, in subsample_if_too_large
    stratify=y,
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 2197, in train_test_split
    train, test = next(cv.split(X=arrays[0], y=stratify))
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1387, in split
    for train, test in self._iter_indices(X, y, groups):
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1715, in _iter_indices
    raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/smac/intensification/parallel_scheduling.py:152: UserWarning: SuccessiveHalving is intended to be used with more than 1 worker but num_workers=1
  num_workers
(60000017, 5)
```

## Environment and installation: ##

Please give details about your installation:

* OS: Debian 10 in docker hosted by Windows 10
* virtual environment
* Python version: 3.7.11
* Auto-sklearn version: development (`11afae22b8c9a6309d2b6fcf7cfb9a947711cd1e`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

auto-sklearn produces probability matrix inconsistent with training input #1190

Describe the bug

To Reproduce

Expected behavior

Actual behavior, stacktrace or logfile

Environment and installation:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

auto-sklearn produces probability matrix inconsistent with training input #1190

Description

Describe the bug

To Reproduce

Expected behavior

Actual behavior, stacktrace or logfile

Environment and installation:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions