Skip to content

Conversation

@ricardoV94
Copy link
Contributor

@ricardoV94 ricardoV94 commented Jul 7, 2021

Closes #18.

TODO:

@codecov
Copy link

codecov bot commented Jul 7, 2021

Codecov Report

Merging #26 (f30b1c2) into main (8741683) will increase coverage by 0.38%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
+ Coverage   94.36%   94.74%   +0.38%     
==========================================
  Files          11       11              
  Lines        1614     1733     +119     
  Branches      230      251      +21     
==========================================
+ Hits         1523     1642     +119     
  Misses         51       51              
  Partials       40       40              
Impacted Files Coverage Δ
aeppl/transforms.py 96.43% <100.00%> (+1.94%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8741683...f30b1c2. Read the comment docs.

@brandonwillard

This comment has been minimized.

@brandonwillard brandonwillard added enhancement New feature or request important This label is used to indicate priority over things not given this label labels Jul 7, 2021
@ricardoV94

This comment has been minimized.

@ricardoV94

This comment has been minimized.

@ricardoV94

This comment has been minimized.

Copy link
Member

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a somewhat high-level description of the approach and design of this feature?

I'm trying to reconcile this with the convolution and scaling rewrites we've been planning, and I want to make sure that we can do both without conflict and/or complications, or—better yet—that we can focus on only one approach/framework.

To be clear, I'm talking about RandomVariable-specific rewrites like sums of normals and location/scale properties. They're limited in that they're not as general as a change-of-variables approach, but they have the big advantage that they result in known distributions and they're rather simple to implement.

It seems like we can always add such rewrites, but that they would need to be performed before these transforms. If that's all there is to it, then I think we're good; if not, then we don't want to dig ourselves into a hole.

Also, we might want to start focusing on those simple rewrites because of their ease and impact, so don't feel rushed to push this through before considering any of those.

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Aug 2, 2021

Can you add a somewhat high-level description of the approach and design of this feature?

The new local rewrite basically checks if there is a single chain of operations that link an "orphan" value variable to an usused base RandomVariable, and replaces the RV graph by a ChainTransformRV that contains the equivalent transformations.

y_rv = at.log( at.random.beta(1, 1) * scale) 
# becomes
y_rv = ChainTransformRV(beta(1, 1), [ScaleTransform(scale), LogTransform()])

I'm trying to reconcile this with the convolution and scaling rewrites we've been planning

To be clear, I'm talking about RandomVariable-specific rewrites like sums of normals and location/scale properties. They're limited in that they're not as general as a change-of-variables approach, but they have the big advantage that they result in known distributions and they're rather simple to implement.

The idea would be to apply the loc-scale reparametrization whenever possible as that results in more succinct graphs. Otherwise this rewrite is more general in that it applies to RVs that cannot be parametrized as such.

In addition it is not limited to loc/scale transforms. The reason why I did not limit to this scenario, is that once we have the ChainedTransformRV no further complexity was needed to work with other transformations.

By the way, the convolutions are not addressed at all in the local rewrite (and I don't think they should). I actually have an explicit check to make sure the graph does not correspond to a potential convolution.

It seems like we can always add such rewrites, but that they would need to be performed before these transforms. If that's all there is to it, then I think we're good; if not, then we don't want to dig ourselves into a hole.

Yes this should not interfere with that at all. Actually those rewrites can equally easily be applied before or after this one, as the ChainTransformRV includes all the necessary information to perform the substitution (and we already did the checks to be sure the value var really corresponds to the the base RV and that it has not been used elsewhere in the graph).

y_rv = ChainTransformRV[Normal(0, 1), [ScaleTransform(scale), LocTransform(loc)]] 
# could be rewritten later (or before) as 
y_rv = Normal(0 + loc, 1 * scale)

In that sense, we could think of the smarter rewrites as a specialization of ChainTransformRVs. But, regardless of which rewrite is applied first, I don't see why the two approaches should conflict.

@brandonwillard brandonwillard added the rv-transforms Involves transforms applied to random variables label Aug 17, 2021
@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Aug 18, 2021

Okay, so my last roadblock is this edge case:

# loc = at.scalar("loc")  # works
# loc = at.vector("loc")  # fails
loc = at.TensorType("floatX", (True,))("loc")  # works

y_rv = loc + at.random.normal(0, 1, size=1)
y = y_rv.clone()

logp = joint_logprob(y_rv, {y_rv: y})

That example only works when loc is a scalar or the custom TensorType.
Otherwise it complains that I am trying to replace the vector y_rv by a LocTransformedRV of type TensorType(float64, (True,)), which is true.

TypeError: Cannot convert Type TensorType(float64, vector) (of Variable y) into Type TensorType(float64, (True,)). You can try to manually convert y into a TensorType(float64, (True,)).

Regardless of the broadcast question in #51 it seems the (1D) vector case should work by default no?

After all, this is valid:

loc = at.vector("loc")
base_rv = at.random.normal(loc, 1, size=1)
base_rv.eval({loc: [1]})

The rewrite fails similarly with row or column sizes such as (1, 3), or (3, 1) when trying to use a generic matrix for loc / scale.

loc = at.matrix("loc")
y_rv = loc + at.random.normal(0, 1, size=(1, 3))
y = y_rv.clone()

logp = joint_logprob(y_rv, {y_rv: y})

Should this be handled by the new rewrite?

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Sep 4, 2021

My last commit provides a possible solution to the issue described in #26 (comment)

I am not very happy with it, but couldn't come up with a more satisfying solution yet.

Another option is to simply not support the cases where the original RV graph and the derived RV yield different broadcastable patterns.

Copy link
Member

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm correctly understanding the purpose of MeasurableRebroadcast, then it would appear that we need another canonicalization that lifts Rebroadcast Ops.

Can you provide an explicit example in which this issue arises, and show the relevant aesara.dprint output?

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Sep 15, 2021

If I'm correctly understanding the purpose of MeasurableRebroadcast, then it would appear that we need another canonicalization that lifts Rebroadcast Ops.

Can you provide an explicit example in which this issue arises, and show the relevant aesara.dprint output?

It appears in test_scale_transform when scalar_scale = False and rv_size=1

Note that the Rebroadcast is something I had to introduce in the rewrite phase so that I could replace the old generic vector node by what would otherwise be a new length 1 vector TransformedRV.

Relevant code:

if transform_rv_out.broadcastable != out_var.broadcastable:

@brandonwillard
Copy link
Member

Note that the Rebroadcast is something I had to introduce in the rewrite phase so that I could replace the old generic vector node by what would otherwise be a new length 1 vector TransformedRV.

Broadcasting is just a subset of shape inference and—just like that—an Apply node's outputs' broadcast patterns are determined by its Op and inputs' broadcast patterns. This means that the mismatched outputs' broadcast patterns can theoretically be fixed by adjusting the broadcast patterns/shapes of their inputs.

This first thing that needs to be done in a situation like this is to determine whether or not the broadcasting inference of the Ops involved are working correctly or not. There have been quite a few instances in which an Op has been found to produce incorrect broadcast patterns for its output, and such things need to be addressed directly.

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Sep 16, 2021

Also, for comparison, I set the similar test_loc_transform in a way that it passes without the additional Rebroadcast during the rewrite. It requires us to create a loc variable with the same broadcast as the RV. The "issue" boils down to:

(at.vector() + at.random.normal(size=1)).type
# TensorType(float64, vector)

(at.random.normal(loc=at.vector(), size=1)).type
# TensorType(float64, (True,))

It feels like the first expression should be replaceable by the second, but they have different broadcastable patterns.


@pytest.mark.parametrize("scalar_loc", (True, False))
@pytest.mark.parametrize("rv_size", [None, 1, 2, (3, 2)])
def test_loc_transform(scalar_loc, rv_size):
# Incompatible parametrization
if not scalar_loc and rv_size is None:
return
if scalar_loc:
loc = at.scalar("loc")
elif isinstance(rv_size, int):
if rv_size == 1:
loc = at.TensorType("floatX", (True,))("loc")
else:
loc = at.vector("loc")
else:
loc = at.matrix("loc")

@brandonwillard
Copy link
Member

brandonwillard commented Sep 17, 2021

Also, for comparison, I set the similar test_loc_transform in a way that it passes without the additional Rebroadcast during the rewrite. It requires us to create a loc variable with the same broadcast as the RV. The "issue" boils down to:

(at.vector() + at.random.normal(size=1)).type
# TensorType(float64, vector)

(at.random.normal(loc=at.vector(), size=1)).type
# TensorType(float64, (True,))

Yes, that definitely seems like the/a problem! We should create an issue for this in Aesara.

Here's a little more background on the situation:

import aesara
import aesara.tensor as at


at.vector().broadcastable
# (False,)

at.random.normal(size=1).broadcastable
# (True,)

(at.vector() + at.random.normal(size=1)).broadcastable
# (False,)

(at.random.normal(loc=at.vector(), size=1)).broadcastable
# (True,)

Apparently, the last example is a bit contradictory, because the mean/loc parameter isn't necessarily broadcastable, yet the size parameter is saying that it is, and that's what's likely determining the type of the output.

@brandonwillard
Copy link
Member

brandonwillard commented Sep 17, 2021

Actually, the last example is inconsistent; the loc=at.vector() and size=1 are contradictory, since at.vector() is a non-broadcastable array—by definition—and size=1 is stating that the output of at.random.normal is necessarily broadcastable (i.e. since size entirely determines the output's shape for scalar RandomVariables).

The second-to-last example makes sense only because the broadcastable RandomVariable output (i.e. size=1) is broadcast against the non-broadcastable at.vector.

Simply put, in the last example, size shouldn't be 1.

@ricardoV94, we need to address whatever it is that's giving rise to that situation.

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Sep 19, 2021

@ricardoV94, we need to address whatever it is that's giving rise to that situation.

So the question in this PR is whether we want to support the derivation for this kind of graph:

loc = at.vector()
rv = at.random.normal(0, 1, size=1) + loc
vv = rv.type()

In theory loc could be of length 1 at evaluation time, which would define a "valid" RV graph.

If we want to support this, we would need to somehow make the derived rv have a vector type (one way could be through the MeasurableBroadcast I added),

If we don't want to support this, then there is no further roadblock to this PR, I would just revert the MeasurableBroadcast commit and let the rewrites fail naturally for this type of graph.

Users can still make this work by providing a valid loc, via loc = at.TensorType("floatX", (True,))("loc")

@brandonwillard
Copy link
Member

brandonwillard commented Sep 19, 2021

So the question in this PR is whether we want to support the derivation for this kind of graph:

No, the question is "Why are we getting inconsistent loc and size values?". As I stated above, loc=at.vector() and size=1 are not consistent parameters, because the former implies that the output should have broadcastable == (False,) and the latter implies it should have broadcastable == (True,).

As far as solutions go, we could do multiple things that don't necessarily restrict the cases covered by AePPL: e.g. rebroadcast the parameters when size == 1 (either in RandomVariable or before calling it), raise an exception stating the inconsistency, etc. Some of these approaches might still benefit from Rebroadcast canonicalizations, but they're not necessary in all cases.

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Sep 24, 2021

Alright, I think I am gonna leave the graph at.vector("loc") + at.random.normal(size=1, name="base_rv") not supported for the time being.

@ricardoV94
Copy link
Contributor Author

This one is ready for review

@ricardoV94
Copy link
Contributor Author

This one is ready for review.

I reverted the Rebroadcast for now. In a subsequent PR we could add a rewrite that "lifts" a Rebroadcast to the RV inputs or changes the RV size, so that we can output rebroadcasted MeasureVariables from our rewrites (or apply it directly in the rewrites)

Copy link
Member

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this needs to be rebased.

@ricardoV94
Copy link
Contributor Author

With #129 this PR is now much more straightforward! Ready for review

@ricardoV94
Copy link
Contributor Author

Hmm, does the IR "reverter" work in this case where nodes are removed instead of replaced?

@ricardoV94
Copy link
Contributor Author

I had to change strategy away from the transform machinery, because I couldn't see a way to incorporate inputs that are not part of the base measurable variable, which is needed for the add and mul cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request important This label is used to indicate priority over things not given this label rv-transforms Involves transforms applied to random variables

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add affine transform support to total_logprob

2 participants