Unexpected behaviour of InverseCostWeightedUtility for negative AF values #2194

AlexanderMouton · 2024-02-07T12:16:43Z

AlexanderMouton
Feb 7, 2024

Hi,

I am doing research on multi-fidelity BO using the qMultiFidelityKnowledgeGradient acquisition function (AF). Building on discussion 1977, negative AF values can arise if the KG AF isn't optimised perfectly.

The InverseCostWeightedUtility scales the AF values of candidate vectors based on their cost. An excerpt of the source code:

        ....
        # clamp (away from zero) and sum cost across elements of the q-batch -
        # this will be of shape `num_fantasies x batch_shape` or `batch_shape`
        cost = cost.clamp_min(self._min_cost).sum(dim=-1)

        # if we are doing inverse weighting on the sample level, clamp numerator.
        if not self._use_mean:
            deltas = deltas.clamp_min(0.0)

        # compute and return the ratio on the sample level - If `use_mean=True`
        # this operation involves broadcasting the cost across fantasies
        return deltas / cost

This works fine for positive AF values, for example:

(delta, cost) -> cost-weighted AF value
(0.5, 10) -> 0.05
(0.5, 5) -> 0.25

In other words, a cheaper candidate with the same AF value will result in a higher AF value.
But when the maximising AF value is negative:

(delta, cost) -> cost-weighted AF value
(-0.5, 10) -> -0.05
(-0.5, 5) -> -0.25

The more costly candidate (with the same AF value) has a higher AF value, which is counterintuitive.

I ran into this issue when I saw that my MFBO algorithm almost always selected the next point to evaluate at the highest fidelity when the maximising AF value was negative.

My suggestion to easily fix this is to instead multiply the deltas by the cost if they are negative, i.e:

(delta, cost) -> cost-weighted AF value
(0.5, 10) -> 0.5/10 -> 0.05
(0.5, 5) -> 0.5/5 -> 0.25
(-0.5, 10) -> -0.5*10 -> -5.0
(-0.5, 5) -> -0.5*5 -> -2.5

In the source code this would look like:

        ....
        # clamp (away from zero) and sum cost across elements of the q-batch -
        # this will be of shape `num_fantasies x batch_shape` or `batch_shape`
        cost = cost.clamp_min(self._min_cost).sum(dim=-1)

        # if we are doing inverse weighting on the sample level, clamp numerator.
        if not self._use_mean:
            deltas = deltas.clamp_min(0.0)

        # compute and return the ratio on the sample level - If `use_mean=True`
        # this operation involves broadcasting the cost across fantasies
        return torch.where(deltas > 0, deltas / cost, deltas * cost)

*Note that self._use_mean is True when following the same setup as in the MFKG tutorial, so deltas won't be clamped to be > 0.

Balandat · 2024-04-13T23:45:48Z

Balandat
Apr 13, 2024
Collaborator

Sorry for the delayed response here @AlexanderMouton - I haven't thought through this in too much detail, but I think this fix makes sense to me. cc @SebastianAment, @dme65, @sdaulton re MF.

4 replies

AlexanderMouton Apr 15, 2024
Author

Hi @Balandat , thanks for getting back to me!

The problem I ran into here was that, each time optimising the AF resulted in a negative value, the cost/fidelity dimensions would be at their maximum values.

I guess there are two ways of looking at this:

Points with negative KG values can be seen as not contributing information according to the AF, and we can assume that evaluating that point at low fidelity will contribute even less information than evaluating it at high fidelity.
The high fidelity point also has a negative KG value, so if we consider it as also not contributing information, then we'd rather evaluate the cheapest point.

Option 2 works best in my case as the HF setting is 700-1000 times more expensive than the LF setting.

I'll open a PR if @SebastianAment @dme65 or @sdaulton agrees with the change.

Thanks!

sdaulton Apr 15, 2024

This makes sense. We have a clamp on the deltas regardless of whether or not the mean was used for HVKG (https://github.com/pytorch/botorch/blob/7c28a20037b67d2e753504519839cd05205ef85a/botorch/acquisition/multi_objective/hypervolume_knowledge_gradient.py#L262), but not single objective KG. This makes a lot of sense and a PR would be great. It would be great to update HVKG too to use the same approach. Thanks!

AlexanderMouton Apr 15, 2024
Author

Hi @sdaulton

Thanks for the feedback!

Is my understanding correct that I should remove the clamp_min function calls, as cost-weighting will now also work with negative function values?

If so I'll open a PR tomorrow :)

AlexanderMouton Apr 16, 2024
Author

Hi @Balandat and @sdaulton

I've opened PR #2297 for this, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected behaviour of InverseCostWeightedUtility for negative AF values #2194

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Unexpected behaviour of InverseCostWeightedUtility for negative AF values #2194

Uh oh!

Uh oh!

AlexanderMouton Feb 7, 2024

Replies: 1 comment · 4 replies

Uh oh!

Balandat Apr 13, 2024 Collaborator

Uh oh!

AlexanderMouton Apr 15, 2024 Author

Uh oh!

sdaulton Apr 15, 2024

Uh oh!

AlexanderMouton Apr 15, 2024 Author

Uh oh!

AlexanderMouton Apr 16, 2024 Author

AlexanderMouton
Feb 7, 2024

Replies: 1 comment 4 replies

Balandat
Apr 13, 2024
Collaborator

AlexanderMouton Apr 15, 2024
Author

AlexanderMouton Apr 15, 2024
Author

AlexanderMouton Apr 16, 2024
Author