6363 < li > < a href ="../../index.html "> Start</ a > </ li >
6464 < li > < a href ="../../releases.html "> Releases</ a > </ li >
6565 < li > < a href ="../../installation.html "> Installation</ a > </ li >
66- < li > < a href ="../../manual.html "> Manual</ a > </ li >
6766 < li > < a href ="../../examples/index.html "> Examples</ a > </ li >
6867 < li > < a href ="../../api.html "> API</ a > </ li >
6968 < li > < a href ="../../extending.html "> Extending</ a > </ li >
@@ -269,39 +268,58 @@ <h1>Source code for autosklearn.estimators</h1><div class="highlight"><pre>
269268< span class ="sd "> 'feature_preprocessor': ["no_preprocessing"]</ span >
270269< span class ="sd "> }</ span >
271270
272- < span class ="sd "> resampling_strategy : str | BaseCrossValidator | _RepeatedSplits | BaseShuffleSplit = "holdout"</ span >
271+ < span class ="sd "> resampling_strategy : Union[ str, BaseCrossValidator, _RepeatedSplits, BaseShuffleSplit] = "holdout"</ span >
273272< span class ="sd "> How to to handle overfitting, might need to use ``resampling_strategy_arguments``</ span >
274273< span class ="sd "> if using ``"cv"`` based method or a Splitter object.</ span >
275274
276- < span class ="sd "> * **Options**</ span >
277- < span class ="sd "> * ``"holdout"`` - Use a 67:33 (train:test) split</ span >
278- < span class ="sd "> * ``"cv"``: perform cross validation, requires "folds" in ``resampling_strategy_arguments``</ span >
279- < span class ="sd "> * ``"holdout-iterative-fit"`` - Same as "holdout" but iterative fit where possible</ span >
280- < span class ="sd "> * ``"cv-iterative-fit"``: Same as "cv" but iterative fit where possible</ span >
281- < span class ="sd "> * ``"partial-cv"``: Same as "cv" but uses intensification.</ span >
282- < span class ="sd "> * ``BaseCrossValidator`` - any BaseCrossValidator subclass (found in scikit-learn model_selection module)</ span >
283- < span class ="sd "> * ``_RepeatedSplits`` - any _RepeatedSplits subclass (found in scikit-learn model_selection module)</ span >
284- < span class ="sd "> * ``BaseShuffleSplit`` - any BaseShuffleSplit subclass (found in scikit-learn model_selection module)</ span >
285-
286275< span class ="sd "> If using a Splitter object that relies on the dataset retaining it's current</ span >
287276< span class ="sd "> size and order, you will need to look at the ``dataset_compression`` argument</ span >
288277< span class ="sd "> and ensure that ``"subsample"`` is not included in the applied compression</ span >
289278< span class ="sd "> ``"methods"`` or disable it entirely with ``False``.</ span >
290279
291- < span class ="sd "> resampling_strategy_arguments : Optional[Dict]</ span >
292- < span class ="sd "> Additional arguments for ``resampling_strategy``, this is required if</ span >
293- < span class ="sd "> using a ``cv`` based strategy:</ span >
294-
295- < span class ="sd "> .. code-block:: python</ span >
296-
297- < span class ="sd "> {</ span >
298- < span class ="sd "> "train_size": 0.67, # The size of the training set</ span >
299- < span class ="sd "> "shuffle": True, # Whether to shuffle before splitting data</ span >
300- < span class ="sd "> "folds": 5 # Used in 'cv' based resampling strategies</ span >
301- < span class ="sd "> }</ span >
302-
303- < span class ="sd "> If using a custom splitter class, which takes ``n_splits`` such as</ span >
304- < span class ="sd "> `PredefinedSplit <https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html#sklearn-model-selection-kfold>`_, the value of ``"folds"`` will be used.</ span >
280+ < span class ="sd "> **Options**</ span >
281+
282+ < span class ="sd "> * ``"holdout"``:</ span >
283+ < span class ="sd "> 67:33 (train:test) split</ span >
284+ < span class ="sd "> * ``"holdout-iterative-fit"``:</ span >
285+ < span class ="sd "> 67:33 (train:test) split, iterative fit where possible</ span >
286+ < span class ="sd "> * ``"cv"``:</ span >
287+ < span class ="sd "> crossvalidation,</ span >
288+ < span class ="sd "> requires ``"folds"`` in ``resampling_strategy_arguments``</ span >
289+ < span class ="sd "> * ``"cv-iterative-fit"``:</ span >
290+ < span class ="sd "> crossvalidation,</ span >
291+ < span class ="sd "> calls iterative fit where possible,</ span >
292+ < span class ="sd "> requires ``"folds"`` in ``resampling_strategy_arguments``</ span >
293+ < span class ="sd "> * 'partial-cv':</ span >
294+ < span class ="sd "> crossvalidation with intensification,</ span >
295+ < span class ="sd "> requires ``"folds"`` in ``resampling_strategy_arguments``</ span >
296+ < span class ="sd "> * ``BaseCrossValidator`` subclass:</ span >
297+ < span class ="sd "> any BaseCrossValidator subclass (found in scikit-learn model_selection module)</ span >
298+ < span class ="sd "> * ``_RepeatedSplits`` subclass:</ span >
299+ < span class ="sd "> any _RepeatedSplits subclass (found in scikit-learn model_selection module)</ span >
300+ < span class ="sd "> * ``BaseShuffleSplit`` subclass:</ span >
301+ < span class ="sd "> any BaseShuffleSplit subclass (found in scikit-learn model_selection module)</ span >
302+
303+ < span class ="sd "> resampling_strategy_arguments : dict, optional if 'holdout' (train_size default=0.67)</ span >
304+ < span class ="sd "> Additional arguments for resampling_strategy:</ span >
305+
306+ < span class ="sd "> * ``train_size`` should be between 0.0 and 1.0 and represent the</ span >
307+ < span class ="sd "> proportion of the dataset to include in the train split.</ span >
308+ < span class ="sd "> * ``shuffle`` determines whether the data is shuffled prior to</ span >
309+ < span class ="sd "> splitting it into train and validation.</ span >
310+
311+ < span class ="sd "> Available arguments:</ span >
312+
313+ < span class ="sd "> * 'holdout': {'train_size': float}</ span >
314+ < span class ="sd "> * 'holdout-iterative-fit': {'train_size': float}</ span >
315+ < span class ="sd "> * 'cv': {'folds': int}</ span >
316+ < span class ="sd "> * 'cv-iterative-fit': {'folds': int}</ span >
317+ < span class ="sd "> * 'partial-cv': {'folds': int, 'shuffle': bool}</ span >
318+ < span class ="sd "> * BaseCrossValidator or _RepeatedSplits or BaseShuffleSplit object: all arguments</ span >
319+ < span class ="sd "> required by chosen class as specified in scikit-learn documentation.</ span >
320+ < span class ="sd "> If arguments are not provided, scikit-learn defaults are used.</ span >
321+ < span class ="sd "> If no defaults are available, an exception is raised.</ span >
322+ < span class ="sd "> Refer to the 'n_splits' argument as 'folds'.</ span >
305323
306324< span class ="sd "> tmp_folder : string, optional (None)</ span >
307325< span class ="sd "> folder to store configuration output and log files, if ``None``</ span >
@@ -313,12 +331,12 @@ <h1>Source code for autosklearn.estimators</h1><div class="highlight"><pre>
313331
314332< span class ="sd "> n_jobs : int, optional, experimental</ span >
315333< span class ="sd "> The number of jobs to run in parallel for ``fit()``. ``-1`` means</ span >
316- < span class ="sd "> using all processors.</ span >
317-
318- < span class ="sd "> **Important notes**:</ span >
319-
320- < span class ="sd "> * By default, Auto-sklearn uses one core.</ span >
321- < span class ="sd "> * Ensemble building is not affected by ``n_jobs`` but can be controlled by the number</ span >
334+ < span class ="sd "> using all processors. </ span >
335+ < span class =" sd " > </ span >
336+ < span class ="sd "> **Important notes**: </ span >
337+ < span class =" sd " > </ span >
338+ < span class ="sd "> * By default, Auto-sklearn uses one core. </ span >
339+ < span class ="sd "> * Ensemble building is not affected by ``n_jobs`` but can be controlled by the number </ span >
322340< span class ="sd "> of models in the ensemble.</ span >
323341< span class ="sd "> * ``predict()`` is not affected by ``n_jobs`` (in contrast to most scikit-learn models)</ span >
324342< span class ="sd "> * If ``dask_client`` is ``None``, a new dask client is created.</ span >
@@ -382,14 +400,16 @@ <h1>Source code for autosklearn.estimators</h1><div class="highlight"><pre>
382400
383401< span class ="sd "> dataset_compression: Union[bool, Mapping[str, Any]] = True</ span >
384402< span class ="sd "> We compress datasets so that they fit into some predefined amount of memory.</ span >
385- < span class ="sd "> Currently this does not apply to dataframes or sparse arrays, only to raw</ span >
386- < span class ="sd "> numpy arrays.</ span >
403+ < span class ="sd "> Currently this does not apply to dataframes or sparse arrays, only to raw numpy arrays.</ span >
387404
388- < span class ="sd "> **NOTE** - If using a custom ``resampling_strategy`` that relies on specific</ span >
405+ < span class ="sd "> **NOTE**</ span >
406+
407+ < span class ="sd "> If using a custom ``resampling_strategy`` that relies on specific</ span >
389408< span class ="sd "> size or ordering of data, this must be disabled to preserve these properties.</ span >
390409
391- < span class ="sd "> You can disable this entirely by passing ``False`` or leave as the default</ span >
392- < span class ="sd "> ``True`` for configuration below.</ span >
410+ < span class ="sd "> You can disable this entirely by passing ``False``.</ span >
411+
412+ < span class ="sd "> Default configuration when left as ``True``:</ span >
393413
394414< span class ="sd "> .. code-block:: python</ span >
395415
@@ -403,36 +423,36 @@ <h1>Source code for autosklearn.estimators</h1><div class="highlight"><pre>
403423
404424< span class ="sd "> The available options are described here:</ span >
405425
406- < span class ="sd "> * * *memory_allocation**</ span >
407- < span class =" sd " > By default, we attempt to fit the dataset into ``0.1 * memory_limit``. </ span >
408- < span class ="sd "> This float value can be set with ``"memory_allocation": 0.1``.</ span >
409- < span class ="sd "> We also allow for specifying absolute memory in MB, e.g. 10MB is </ span >
410- < span class ="sd "> ``"memory_allocation": 10``.</ span >
411-
412- < span class ="sd "> The memory used by the dataset is checked after each reduction method is</ span >
413- < span class ="sd "> performed. If the dataset fits into the allocated memory, any further</ span >
414- < span class ="sd "> methods listed in ``"methods"`` will not be performed.</ span >
415-
416- < span class ="sd "> For example, if ``methods: ["precision", "subsample"]`` and the</ span >
417- < span class ="sd "> ``"precision"`` reduction step was enough to make the dataset fit into</ span >
418- < span class ="sd "> memory, then the ``"subsample"`` reduction step will not be performed.</ span >
419-
420- < span class ="sd "> * * *methods**</ span >
421- < span class =" sd " > We provide the following methods for reducing the dataset size. </ span >
422- < span class ="sd "> These can be provided in a list and are performed in the order as given .</ span >
423-
424- < span class =" sd " > * ``"precision"`` - We reduce floating point precision as follows: </ span >
425- < span class ="sd "> * ``np.float128 -> np.float64`` </ span >
426- < span class ="sd "> * ``np.float96 -> np.float64``</ span >
427- < span class ="sd "> * ``np.float64 -> np.float32 ``</ span >
428-
429- < span class =" sd " > * ``subsample`` - We subsample data such that it **fits directly into </ span >
430- < span class ="sd "> the memory allocation** ``memory_allocation * memory_limit``. </ span >
431- < span class ="sd "> Therefore, this should likely be the last method listed in </ span >
432- < span class ="sd "> ``"methods"``.</ span >
433- < span class ="sd "> Subsampling takes into account classification labels and stratifies</ span >
434- < span class ="sd "> accordingly. We guarantee that at least one occurrence of each</ span >
435- < span class ="sd "> label is included in the sampled set.</ span >
426+ < span class ="sd "> **memory_allocation**</ span >
427+
428+ < span class ="sd "> By default, we attempt to fit the dataset into `` 0.1 * memory_limit ``. This </ span >
429+ < span class ="sd "> float value can be set with ``"memory_allocation": 0.1``. We also allow for </ span >
430+ < span class ="sd "> specifying absolute memory in MB, e.g. 10MB is ``"memory_allocation": 10``.</ span >
431+
432+ < span class ="sd "> The memory used by the dataset is checked after each reduction method is</ span >
433+ < span class ="sd "> performed. If the dataset fits into the allocated memory, any further methods </ span >
434+ < span class ="sd "> listed in ``"methods"`` will not be performed.</ span >
435+
436+ < span class ="sd "> For example, if ``methods: ["precision", "subsample"]`` and the</ span >
437+ < span class ="sd "> ``"precision"`` reduction step was enough to make the dataset fit into memory, </ span >
438+ < span class ="sd "> then the ``"subsample"`` reduction step will not be performed.</ span >
439+
440+ < span class ="sd "> **methods**</ span >
441+
442+ < span class ="sd "> We currently provide the following methods for reducing the dataset size .</ span >
443+ < span class =" sd " > These can be provided in a list and are performed in the order as given. </ span >
444+
445+ < span class ="sd "> * ``"precision"`` - We reduce floating point precision as follows: </ span >
446+ < span class ="sd "> * ``np.float128 -> np.float64``</ span >
447+ < span class ="sd "> * ``np.float96 -> np.float64 ``</ span >
448+ < span class =" sd " > * ``np.float64 -> np.float32`` </ span >
449+
450+ < span class ="sd "> * ``subsample`` - We subsample data such that it **fits directly into the </ span >
451+ < span class ="sd "> memory allocation** ``memory_allocation * memory_limit``. Therefore, this</ span >
452+ < span class ="sd "> should likely be the last method listed in ``"methods"``.</ span >
453+ < span class ="sd "> Subsampling takes into account classification labels and stratifies</ span >
454+ < span class ="sd "> accordingly. We guarantee that at least one occurrence of each label is </ span >
455+ < span class ="sd "> included in the sampled set.</ span >
436456
437457< span class ="sd "> Attributes</ span >
438458< span class ="sd "> ----------</ span >
0 commit comments