Skip to content

Unexpected behavior with pm.sample(var_names=...) #7258

@tomicapretto

Description

@tomicapretto

Description

PyMC 5.13 incorporates the var_names parameter in pm.sample(). The documentation says var_names: Names of variables to be stored in the trace. Defaults to all free variables and deterministics.

This comes very handy for something I've been trying to do in Bambi. Now I'm porting Bambi to use this feature and noticed weird results with tests. I reproduced one of the models with PyMC and noticed the problem. Have a look at this

import arviz as az
import numpy as np
import pymc as pm

batch = np.array(
    [
        1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  4,  4,  4,  4,  5,  5,  5,
        6,  6,  6,  7,  7,  7,  7,  8,  8,  8,  9,  9, 10, 10, 10
    ]
)
temp = np.array(
    [
        205, 275, 345, 407, 218, 273, 347, 212, 272, 340, 235, 300, 365,
        410, 307, 367, 395, 267, 360, 402, 235, 275, 358, 416, 285, 365,
        444, 351, 424, 365, 379, 428
    ]
)

y = np.array(
    [
        0.122, 0.223, 0.347, 0.457, 0.08 , 0.131, 0.266, 0.074, 0.182,
        0.304, 0.069, 0.152, 0.26 , 0.336, 0.144, 0.268, 0.349, 0.1  ,
        0.248, 0.317, 0.028, 0.064, 0.161, 0.278, 0.05 , 0.176, 0.321,
        0.14 , 0.232, 0.085, 0.147, 0.18
    ]
)

batch_values, batch_idx  = np.unique(batch, return_inverse=True)

coords = {
    "batch": batch_values
}

with pm.Model(coords=coords) as model:
    b_batch = pm.Normal("b_batch", dims="batch")
    b_temp = pm.Normal("b_temp")
    mu = pm.Deterministic("mu", pm.math.invlogit(b_batch[batch_idx] + b_temp * temp))
    kappa = pm.Gamma("kappa", alpha=2, beta=2)
    
    alpha = mu * kappa
    beta = (1 - mu) * kappa
    
    pm.Beta("y", alpha=alpha, beta=beta, observed=y)

I want to sample the posterior, but I don't want to store the draws of "mu" by default. So I use var_names=["b_batch", "b_temp", "kappa"] (and I also sample without var_names to see the difference).

with model:
    idata_1 = pm.sample(random_seed=1234)
    idata_2 = pm.sample(var_names=["b_batch", "b_temp", "kappa"], random_seed=1234)

When I don't use var_names I get the following posterior

az.plot_trace(idata_1, backend_kwargs={"layout": "constrained"});

image

and when I use var_names it's the following

az.plot_trace(idata_2, backend_kwargs={"layout": "constrained"});

image

which makes me think it's basically omitting the likelihood and thus sampling from the prior.

Is this behavior expected?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions