Changes to the default Multi-Task parametrization #3065

hvarfner · 2025-10-29T20:57:04Z

hvarfner
Oct 29, 2025
Collaborator

We are introducing a new multitask parametrization in BoTorch. This update aims to improve the modeling of correlations between tasks and address some apparent shortcomings of the existing parametrization. For the MultiTaskGP models, we make the following changes to the defaults:

ConstantMean --> MultiTaskMean
IndexKernel --> PositiveIndexKernel

However, these can easily be swapped out for the older versions by providing a custom mean_module or task_covar_module to the model.

Motivation
The previous multitask parametrization in BoTorch had limitations in capturing the correlations between tasks, often leading to suboptimal performance. In general, especially in the very low data regime, it is extremely hard to infer cross-task correlations. By reducing flexibility of the model and being optimistic about the correlations between tasks being positive we can improve modeling and optimization performance. With very little data and unconstrained task correlations, both perfect negative and positive correlation are plausible. The MLL objective is rewarded simply for correlating the data while minimizing the residuals, making negative task correlations a fairly common occurrence.

Implementation Details
MultiTaskMean infers one mean function per task. For any type of constant offset between tasks, or with the use of StratifiedStandardize, this parametrization is paramount. In low-data regimes, StratifiedStandardize's mean parameters will likely differ even if the tasks in the MTGP are identical, making MultiTaskMean particularly important when this transform is used.
PositiveIndexKernel: Inferring task correlation is difficult. When data is scarce and non-identical across tasks (e.g. 3 random points per task on 2 Branin tasks), task correlation inference can be very spotty, to the point where -1 and +1 is inferred with almost equal frequency in extreme cases (see image below).

In addition to more accurate task correlation inference (assuming task correlation is positive), the change to PositiveIndexKernel is made to align better with people's intuition - that source tasks are indeed positively correlated with the target task.

Results
In the figure below, we show the inferred task correlation of GPs with different parametrizations, ranging from the current default MultiTaskGP (red), the addition of either component in isolation (yellow, green) to the new default (blue).

For all three functions (Branin (2D), Forrester (1D), Hartmann (3D), we use the same function as the source as for the target, so we try to infer the correlation between Branin and itself. For the target task, we use three data points, and two source tasks (total of 16 data points) with three different degrees of data imbalance (1:1 means 8 points from each, 3:1 means 12 points from task A, 4 from task B) and 7:1 means 14 from task A and 2 from task B). All data is drawn uniformly at random. We evaluate the rank correlation on a held-out set on the target task, i.e. the ability of the model to predict the rankings of unseen data points. Both the old and the partial new parametrizations perform rather poorly on this task, whereas the new one successfully infers the rankings of unseen data about as well as if the source and target task were all slotted into one shared SingleTaskGP (black dashed line). This is (arguably) the performance that is expected in this setting, as source data and target data are perfectly correlated and on the same scale.

Potential future changes:
The SaasFullyBayesianMultiTaskGP will likely undergo similar changes. Moreover, the transforms applied to multi-task modeling (specifically StratifiedStandardize) are currently under evaluation.

Potential downsides - when you’d not want to use this
The change from IndexKernel from PositiveIndexKernel is made with optimization in mind. Specifically, this means typical BO scenarios where data is scarce and tasks are assumed to be correlated, but there isn’t enough evidence to robustly estimate the true correlation. In these settings, constraining correlations to be positive can improve modeling stability and optimization performance.
However, PositiveIndexKernel may not be ideal for regression tasks. If you have access to large amounts of data and your goal is to determine whether tasks are correlated (positively or negatively), the standard IndexKernel is preferable.

suttergustavo · 2025-11-21T21:45:48Z

suttergustavo
Nov 21, 2025

Hi @hvarfner , please correct me if I'm wrong, but it seems that it isn't possible to provide a custom task_covar_module to MultiTaskGP. Would it make sense to support that in the future? This could be useful not only for users who want to revert to IndexKernel, but also for those who need to use other task kernels.

I'm happy to help if needed!

2 replies

hvarfner Nov 27, 2025
Collaborator Author

Hi @suttergustavo ,

Thanks for reaching this, and sorry for the belated reply!

You're right, this isn't currently possible. The quickest way to experiment with a different approach is to subclass the MTGP. Supporting custom covariance modules is a reasonable idea, but it would need to integrate with the existing interface, which involves passing a task_feature and constructing the task_covar_module based on that. Having the task_covar_module be an optional kwarg seems low-risk to me.

suttergustavo Nov 27, 2025

Thanks for the response!

Passing task_feature is equivalent to setting the active_dims of the kernel. Currently, this is done using the constructor of the task kernel, but I believe it could also be enforced after the fact if the task_covar is provided as an argument. In fact, this is exactly what's currently being done for the data kernel:

if covar_module is None:
    data_covar_module = get_covar_module_with_dim_scaled_prior(
        ard_num_dims=self.num_non_task_features,
        active_dims=self._base_idxr,
    )
else:
    data_covar_module = covar_module
    # This check enables models which don't adhere to the convention (e.g.
    # adding additional feature dimensions, like HeteroMTGP) to be used.
    if covar_module.active_dims is None:
        # Since we no longer use the custom indexing which derived the
        # task indexing in the forward pass, we need to explicitly set
        # the active dims here to ensure that the forward pass works.
        data_covar_module.active_dims = self._base_idxr

If I'm not misunderstanding the expected module behaviour, I believe a similar implementation for the task kernel should be possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes to the default Multi-Task parametrization #3065

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Changes to the default Multi-Task parametrization #3065

Uh oh!

hvarfner Oct 29, 2025 Collaborator

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

suttergustavo Nov 21, 2025

Uh oh!

Uh oh!

hvarfner Nov 27, 2025 Collaborator Author

Uh oh!

suttergustavo Nov 27, 2025

hvarfner
Oct 29, 2025
Collaborator

Replies: 1 comment 2 replies

suttergustavo
Nov 21, 2025

hvarfner Nov 27, 2025
Collaborator Author