Skip to content

Strange Aesara warning and broken pipe error when fitting model with missing values #5339

@fonnesbeck

Description

@fonnesbeck

I'm fitting a manually-constructed GRW model in v4beta that includes a likelihood with a lot of missing values. As the model is fit using sample, I get several of these errors.

warnings.warn(
/Users/cfonnesbeck/GitHub/pie/.env/lib/python3.8/site-packages/aeppl/joint_logprob.py:161: UserWarning: Found a random variable that was neither among the observations nor the conditioned variables: halfnormal_rv{0, (0, 0), floatX, False}(RandomStateSharedVariable(<RandomState(MT19937) at 0x16E5E6640>), TensorConstant{[]}, TensorConstant{11}, BroadcastTo.0, BroadcastTo.0)

The model is below; the node it is referring to is the likelihood standard deviation. If missing value is filled, the model runs normally.

coords = {
    'pitcher': pitcher.values,
    'age': age.values.astype(int)
}
with pm.Model(coords=coords) as age_model:

    z_mu = pm.Normal('z_mu', mu=0, sigma=1, dims='pitcher')
    s_mu = pm.HalfCauchy('s_mu', 3)
    m_mu = pm.Normal('m_mu', mu=92, sigma=5)
    mu = pm.Deterministic('mu', m_mu + s_mu * z_mu)

    # GRW 
    rho = pm.Normal('rho', mu=0, sigma=np.append(0.001, np.ones(len(age)-1)))
    age_curve = pm.Deterministic('age_curve', rho.cumsum(), dims='age')

    theta = mu.dimshuffle(0, 'x') + age_curve

    sigma = pm.HalfNormal('sigma', 10)
    velo = pm.Normal('velo', theta, sigma=sigma, observed=y)

When run, the model quickly dies with a broken pipe, apparently related to multiprocessing:

/Users/cfonnesbeck/GitHub/pie/.env/src/pymc/pymc/model.py:1301: ImputationWarning: Data in velo contains missing values and will be automatically imputed from the sampling distribution.
  warnings.warn(impute_message, ImputationWarning)
Auto-assigning NUTS sampler...
Initializing NUTS using adapt_full...
/Users/cfonnesbeck/GitHub/pie/.env/src/pymc/pymc/step_methods/hmc/quadpotential.py:611: UserWarning: QuadPotentialFullAdapt is an experimental feature
  warnings.warn("QuadPotentialFullAdapt is an experimental feature")
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [z_mu, s_mu, m_mu, rho, velo_missing]
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/forkserver.py", line 280, in main
    code = _serve_one(child_r, fds,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/forkserver.py", line 319, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/cfonnesbeck/GitHub/pie/research/projections/src/pitchers/ageing_curves.py", line 115, in <module>
    trace = pm.sample(1000, tune=1000, init='adapt_full')
  File "/Users/cfonnesbeck/GitHub/pie/.env/src/pymc/pymc/sampling.py", line 566, in sample
    trace = _mp_sample(**sample_args, **parallel_args)
  File "/Users/cfonnesbeck/GitHub/pie/.env/src/pymc/pymc/sampling.py", line 1468, in _mp_sample
    sampler = ps.ParallelSampler(
  File "/Users/cfonnesbeck/GitHub/pie/.env/src/pymc/pymc/parallel_sampling.py", line 413, in __init__
    self._samplers = [
  File "/Users/cfonnesbeck/GitHub/pie/.env/src/pymc/pymc/parallel_sampling.py", line 414, in <listcomp>
    ProcessAdapter(
  File "/Users/cfonnesbeck/GitHub/pie/.env/src/pymc/pymc/parallel_sampling.py", line 282, in __init__
    self._process.start()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

---------------------------------------------------------------------------
BrokenPipeError                           Traceback (most recent call last)
~/GitHub/pie/research/projections/src/pitchers/ageing_curves.py in <module>
      113 with age_model:
      114 
----> 115     trace = pm.sample(1000, tune=1000, init='adapt_full')
      116 

~/GitHub/pie/.env/src/pymc/pymc/sampling.py in sample(draws, step, init, n_init, initvals, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, **kwargs)
    564         _print_step_hierarchy(step)
    565         try:
--> 566             trace = _mp_sample(**sample_args, **parallel_args)
    567         except pickle.PickleError:
    568             _log.warning("Could not pickle model, sampling singlethreaded.")

~/GitHub/pie/.env/src/pymc/pymc/sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, callback, discard_tuned_samples, mp_ctx, **kwargs)
   1466         traces.append(strace)
   1467 
-> 1468     sampler = ps.ParallelSampler(
   1469         draws,
   1470         tune,

~/GitHub/pie/.env/src/pymc/pymc/parallel_sampling.py in __init__(self, draws, tune, chains, cores, seeds, start_points, step_method, start_chain_num, progressbar, mp_ctx)
    411             step_method_pickled = cloudpickle.dumps(step_method, protocol=-1)
    412 
--> 413         self._samplers = [
    414             ProcessAdapter(
    415                 draws,

~/GitHub/pie/.env/src/pymc/pymc/parallel_sampling.py in <listcomp>(.0)
    412 
    413         self._samplers = [
--> 414             ProcessAdapter(
    415                 draws,
    416                 tune,

~/GitHub/pie/.env/src/pymc/pymc/parallel_sampling.py in __init__(self, draws, tune, step_method, step_method_pickled, chain, seed, start, mp_ctx)
    280             ),
    281         )
--> 282         self._process.start()
    283         # Close the remote pipe, so that we get notified if the other
    284         # end is closed.

/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py in start(self)
    119                'daemonic processes are not allowed to have children'
    120         _cleanup()
--> 121         self._popen = self._Popen(self)
    122         self._sentinel = self._popen.sentinel
    123         # Avoid a refcycle if the target function holds an indirect

/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py in _Popen(process_obj)
    289         def _Popen(process_obj):
    290             from .popen_forkserver import Popen
--> 291             return Popen(process_obj)
    292 
    293     class ForkContext(BaseContext):

/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_forkserver.py in __init__(self, process_obj)
     33     def __init__(self, process_obj):
     34         self._fds = []
---> 35         super().__init__(process_obj)
     36 
     37     def duplicate_for_child(self, fd):

/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py in __init__(self, process_obj)
     17         self.returncode = None
     18         self.finalizer = None
---> 19         self._launch(process_obj)
     20 
     21     def duplicate_for_child(self, fd):

/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_forkserver.py in _launch(self, process_obj)
     56                                        (_parent_w, self.sentinel))
     57         with open(w, 'wb', closefd=True) as f:
---> 58             f.write(buf.getbuffer())
     59         self.pid = forkserver.read_signed(self.sentinel)
     60 

BrokenPipeError: [Errno 32] Broken pipe

Running with a single chain resolves the error (but not the warning)

Versions and main components

  • PyMC/PyMC3 Version: 4.0.0b1
  • Aesara/Theano Version: Aesara 2.3.2
  • Python Version: 3.8.9
  • Operating system: macOS
  • How did you install PyMC/PyMC3: miniforge

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions