feat: Allow for clipping negative yields in the main model #1845

kratsg · 2022-04-08T16:21:15Z

Description

Provides functionality for clipping expected_data in the _MainModel via pyhf.Model(clip_sample_data=0.0, clip_bin_data=0.0) keyword arguments. Added to the docstrings.

Backwards functionality is preserved, which is that clipping is not done. There does not seem to be a performance hit in checking for clipping or not. Clipping can be done at two levels:

the level at which we have the individual samples (perhaps useful if we need the granularity and a specific sample is providing large negative yields)
the level at which we have the bin-by-bin data (this is primarily the expected use case that makes the most sense? Maybe...)

Note that it seems ROOT primarily goes for the level of the individual samples, so adding both options and leaving it up to the user to decide if they want to use this. If you have all but one sample which is ok, and one sample providing large negative yields, you can either zero out the sample, or zero out the bin.

See documentation changes here: https://pyhf--1845.org.readthedocs.build/en/1845/_generated/pyhf.pdf.Model.html#pyhf.pdf.Model

Checklist Before Requesting Reviewer

Tests are passing
"WIP" removed from the title of the pull request
Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

Summarize commit messages into a comprehensive review of the PR

* API change to allow for the _MainModel to clip expected data at the sample-wise
data, or bin-wise data levels.
   - Should help make Minuit a little more robust in particular cases where the
     workspace provided has regions of phase-space where the model is not
     well-behaved (negative yields).
   - Add warning to initialization of _MainModel if clipping is used to ensure that users are
     aware of what they are doing.
* Add tests for clipping by-sample and clipping by-bin.

codecov · 2022-04-08T16:30:52Z

Codecov Report

Merging #1845 (c72af40) into master (86229ad) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1845   +/-   ##
=======================================
  Coverage   98.17%   98.17%           
=======================================
  Files          68       68           
  Lines        4322     4331    +9     
  Branches      726      730    +4     
=======================================
+ Hits         4243     4252    +9     
  Misses         46       46           
  Partials       33       33

Flag	Coverage Δ
contrib	`26.34% <16.66%> (-0.06%)`	⬇️
doctest	`60.63% <33.33%> (-0.11%)`	⬇️
unittests-3.10	`96.05% <100.00%> (+<0.01%)`	⬆️
unittests-3.7	`96.03% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/pyhf/pdf.py	`97.86% <100.00%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86229ad...c72af40. Read the comment docs.

kratsg · 2022-04-20T20:13:15Z

./cc @mdhank

src/pyhf/pdf.py

kratsg · 2022-04-20T20:31:33Z

@HDembinski - this is a possible interesting edge-case for Minuit. Minuit has issues with convergence on a (admittedly hacked-together) workspace that throws negative yields. If we explicitly add clipping to make sure the data is non-negative during minimization, there's no issue. (SciPy seems to be "robust" regardless).

matthewfeickert

@kratsg Overall LGTM. 👍 Thanks!

I just have a question on the tolerances, but beyond that this seems good. Also is there any associated GitHub Issue or Discussion for this? Or was this only brought to our attention through email?

tests/test_pdf.py

matthewfeickert · 2022-04-21T05:26:11Z

(We can ignore that Semantic Pull Request isn't reporting at the moment given zeke/semantic-pull-requests#183 and just visually check. This PR is fine. 👍)

HDembinski · 2022-04-21T06:06:25Z

This is not a Minuit issue. You should not have negative predictions for bin entries in a maximum likelihood fit, ever. Negative expectations are mathematically forbidden. Construct the model and limit the model parameters so that this never happens. If scipy handles this better, it is by chance.

The main source for negative expectations are: using Chebychev polynomials for background instead of Bernstein polynomials and allowing signal or background fractions that are negative.

There is a related study in the iminuit docs about using scipy as an optimisier and a constrained fit. I can't link it here because I am on the phone.

matthewfeickert · 2022-04-21T06:26:40Z

There is a related study in the iminuit docs about using scipy as an optimisier and a constrained fit. I can't link it here because I am on the phone.

(Linking here before I sign off for the night) I think the study is this one: SciPy minimizers and constraints (which is the scipy_and_constraints.ipynb notebook in the repo).

HDembinski · 2022-04-21T08:07:49Z

@matthewfeickert Thank you for adding the link, yes, that's the one I was referring to.

To add to my previous comment: this workaround is not going to work well, because by clipping the predictions you introduce discontinuities in the cost function, but Minuit and other gradient-based minimisers expect that the cost function is analytical, an infinitesimal changes in the parameters produces an infinitesimal change in the value of the cost function. When you clip, this is no longer the case.

You may think that you can use a gradient-free minimiser as a way out, but that does not work either, in general. If the algorithm converges to a minimum that has some of the predictions clipped, then the estimates found are no longer maximum-likelihood estimates (because you didn't mathematically maximize a likelihood function). In general, this means you loose one of the key properties of a maximum-likelihood fit, that the covariance matrix of the fitted parameters can be computed from the matrix of second derivatives (if second derivatives can be computed at all despite the discontinuities of the cost function). You are only ok, if the minimum that is found does not have any of the predictions clipped.

kratsg · 2022-04-21T13:55:10Z

@HDembinski , In general, I would agree that well-designed likelihood should not spit out negative yields. I think it happens when you have a lot of shape systematics that are uncorrelated but pulled in the same direction. I think limiting the model parameters manually would make the most sense if your model is simple enough (some of the issues from within ATLAS correspond with models with > 100 parameters, so it becomes slightly more intractable to hunt down all problematic parameters).

In ROOT, I will point out that this is enabled by default, which means ROOT is using Minuit with the forced positive-definite. I think it would be useful to understand why SciPy seems to be able to converge in these cases while Minuit [which is more strict and careful] will rightfully complain.

Does this indicate, even with the comparison studies you've done, that SciPy is wrong and should be more strict? Or is the argument you're making that for poorly-defined workspaces, you throw everything out of the window when it comes to comparing Minuit and SciPy?

kratsg · 2022-04-21T14:49:18Z

To add to my previous comment: this workaround is not going to work well, because by clipping the predictions you introduce discontinuities in the cost function, but Minuit and other gradient-based minimisers expect that the cost function is analytical, an infinitesimal changes in the parameters produces an infinitesimal change in the value of the cost function. When you clip, this is no longer the case.

Yeah, this is actually the reason why I and others like @alexander-held are surprised to see this turned on by default in ROOT, so we're not entirely sure why. We can add specific documentation in the docstring to say that this option is "dangerous" (or that they should be careful trusting any results that needs this option).

HDembinski · 2022-04-21T17:53:48Z

@HDembinski , In general, I would agree that well-designed likelihood should not spit out negative yields. I think it happens when you have a lot of shape systematics that are uncorrelated but pulled in the same direction. I think limiting the model parameters manually would make the most sense if your model is simple enough (some of the issues from within ATLAS correspond with models with > 100 parameters, so it becomes slightly more intractable to hunt down all problematic parameters).

I think it is not so difficult, because most of the parameters are shape parameters. You can change those as you like, it will not generate negative bin expectations. You may get zero expectation in some bin, which is also bad, but easier to work around. You cannot work around negative expectations. The main problem is the use Chebychev polynomials for background and allowing amplitudes of pdfs to be negative. Both is a bad tradition that we need to break away from.

In ROOT, I will point out that this is enabled by default, which means ROOT is using Minuit with the forced positive-definite.

Can you please link to that code in ROOT, because I don't know what you mean? You talk about positive-definiteness, but that is a different thing from what I am talking about. Positive-definiteness means that the curvature of the cost function is positive in all directions. The matrix of second derivatives has to be positive-definite for Minuit/Migrad both in ROOT and in iminuit.

I think it would be useful to understand why SciPy seems to be able to converge in these cases while Minuit [which is more strict and careful] will rightfully complain.

Minuit's Migrad algorithm is not more strict and careful. On the contrary, Migrad as implemented in Minuit2, is less robust than some of the algorithms in SciPy in my experience. Migrad cannot recover well from situations where the cost function returns NaN. I think there is a potential for improvement there, but it is not an easy fix.

Does this indicate, even with the comparison studies you've done, that SciPy is wrong and should be more strict? Or is the argument you're making that for poorly-defined workspaces, you throw everything out of the window when it comes to comparing Minuit and SciPy?

My comparisons indicate that sometimes Migrad works better than other minimisers and sometimes the other minimisers work better. It depends on the specific problem. My argument is that people need to define their models properly and there is little that you can do on the level of pyhf to fix bad models constructed by users. Another (unrelated) common problem is that users generate models with strongly correlated parameters. There is also nothing you can do in pyhf to prevent that. You should educate your users about this as I am doing.

kratsg · 2022-04-26T16:01:12Z

Can you please link to that code in ROOT, because I don't know what you mean?

Hans, thanks for being patient and sorry for not responding as quickly. I will quote the email from @alexander-held but this was pointed out by @mdhank who found the relevant code/digging

Michael pointed me to https://github.com/root-project/root/blob/4c669d4ead82c3b328eaebc7f160a119c4dad31d/roofit/histfactory/src/PiecewiseInterpolation.cxx#L297-L305. ROOT implements a way to force positive-definite predictions. This seems to be used by default: in HistoToWorkspaceFactoryFast::MakeLinInterpWithConstraint [1], which is used in HistoToWorkspaceFactoryFast::MakeSingleChannelWorkspace [2], which is called via HistoToWorkspaceFactoryFast::MakeSingleChannelModel from RooStats::HistFactory::MakeModelAndMeasurementFast. It naively does not look configurable to me either, so I assume most/all of HistFactory models in ATLAS would be using this.

[1] https://github.com/root-project/root/blob/4c669d4ead82c3b328eaebc7f160a119c4dad31d/roofit/histfactory/src/HistoToWorkspaceFactoryFast.cxx#L468
[2] https://github.com/root-project/root/blob/4c669d4ead82c3b328eaebc7f160a119c4dad31d/roofit/histfactory/src/HistoToWorkspaceFactoryFast.cxx#L1318

You talk about positive-definiteness, but that is a different thing from what I am talking about. Positive-definiteness means that the curvature of the cost function is positive in all directions. The matrix of second derivatives has to be positive-definite for Minuit/Migrad both in ROOT and in iminuit.

I agree. I'm probably making this a bit confusing because this is the name that ROOT seems to have chosen, but looking at the code, it doesn't seem to be related to derivatives or the cost function - but rather simply just the yields. In the case when I'm referring to "positive-definite" - I'm referring to the phrasing ROOT is using, rather than my understanding of what it means in minimization which is what you explained. It's the reason why I chose a different name for the option in pyhf, because this isn't changing or touching the second-derivative.

Minuit's Migrad algorithm is not more strict and careful. On the contrary, Migrad as implemented in Minuit2, is less robust than some of the algorithms in SciPy in my experience. Migrad cannot recover well from situations where the cost function returns NaN. I think there is a potential for improvement there, but it is not an easy fix.

Noted, I think I understand where you're coming from. I still would like to provide this functionality in pyhf to at least match what exists in ROOT, but keep it as non-default. I do want to add more verbose description in the documentation to clarify the expectations of the minimizers and the expectations of the users, to make sure people understand that it is not up to pyhf to enforce that workspaces are well-defined, but we'll do our best.

alexander-held · 2022-04-26T22:50:48Z

I think it is not so difficult, because most of the parameters are shape parameters. You can change those as you like, it will not generate negative bin expectations.

In the context of pyhf, "shape" variations are encoded by histosys modifiers which leave the total yield across all bins in a channel unchanged. If the shape is sufficiently strong, it can lead to scenarios where a change of the related parameter can drive the predicted yield in some of the bins to be negative (compensating for increases in yield in other bins). An example for that is in this gist for the case of two histosys modifiers, the same also can happen for a single histosys modifier if the parameter is pulled beyond one sigma. A way to address this in practice is the use of normsys modifiers that provide exponential extrapolation, but any remaining shape effects can still cause this behavior in principle.

alexander-held · 2022-06-13T22:07:41Z

#1884 might also be of interest in this context: it's not specifically about protecting against negative yields, but protecting against NaNs due to ill-defined templates. I don't know where/how exactly ROOT protects against this.

matthewfeickert

I still would like to provide this functionality in pyhf to at least match what exists in ROOT, but keep it as non-default. I do want to add more verbose description in the documentation to clarify the expectations of the minimizers and the expectations of the users, to make sure people understand that it is not up to pyhf to enforce that workspaces are well-defined, but we'll do our best.

@kratsg (thanks for your patience and I hope that I never do such a poor job of responding to you again) I think this is a reasonable path forward for now. Though to avoid more delays on this I think making a dedicated docs Issue for the descriptions/discussions would be a good idea.

src/pyhf/pdf.py

alexander-held

I have not tested this very systematically, but have used this for a few problematic workspaces in the past few weeks and have not run into issues with the implementation. It does achieve the intended effect in the examples I have run.

mdhank · 2022-06-21T11:55:46Z

I have also tested this some on my workspaces, I've found it does seem to work as intended in my cases. Interestingly, it seems clipping bins might improve stability more than clipping samples; I've run into some cases where the former works but not the latter.

matthewfeickert · 2022-06-22T23:04:42Z

I have also tested this some on my workspaces, I've found it does seem to work as intended in my cases. Interestingly, it seems clipping bins might improve stability more than clipping samples; I've run into some cases where the former works but not the latter.

@mdhank Thanks for testing this out. I'm quite behind on things at the moment (this PR is testimony to that) but if you have examples of where this is helping can you share them?

* Update lower bound on iminuit to v2.7.0 as it is the 'oldest' SemVer release that has the behavior: > Minuit.hesse no longer sets the status of the FunctionMinimum unconditionally > to valid if it was invalid before. Without this fix, the situation can arise where `Minuit.hesse()` can cause `Minuit.valid` to be True when it should be False. * Update tests/constraints.txt to use iminuit==2.7.0. * Amends PR #1845

mdhank · 2022-06-23T19:08:52Z

I have also tested this some on my workspaces, I've found it does seem to work as intended in my cases. Interestingly, it seems clipping bins might improve stability more than clipping samples; I've run into some cases where the former works but not the latter.

@mdhank Thanks for testing this out. I'm quite behind on things at the moment (this PR is testimony to that) but if you have examples of where this is helping can you share them?

The workspaces in question contain non-public ATLAS results, but I have sent versions of them to the pyhf developers via e-mail in March-April.

kratsg force-pushed the feat/clipPoissonRates branch from 048830d to 7093bdb Compare April 11, 2022 18:13

kratsg force-pushed the feat/clipPoissonRates branch from 7093bdb to be63893 Compare April 20, 2022 19:32

kratsg self-assigned this Apr 20, 2022

kratsg requested review from alexander-held, lukasheinrich and matthewfeickert April 20, 2022 20:09

kratsg added feat/enhancement New feature or request research experimental stuff optimization API Changes the public API user request Request coming form a pyhf user labels Apr 20, 2022

kratsg marked this pull request as ready for review April 20, 2022 20:10

kratsg commented Apr 20, 2022

View reviewed changes

src/pyhf/pdf.py Outdated Show resolved Hide resolved

matthewfeickert approved these changes Apr 21, 2022

View reviewed changes

tests/test_pdf.py Show resolved Hide resolved

matthewfeickert added the tests pytest label Apr 21, 2022

kratsg requested a review from matthewfeickert May 25, 2022 15:55

matthewfeickert force-pushed the feat/clipPoissonRates branch from dec5729 to 3f35195 Compare June 9, 2022 13:37

alexander-held mentioned this pull request Jun 13, 2022

Convergence failure using fixed_poi_fit #1886

Closed

1 task

kratsg added 5 commits June 20, 2022 17:51

maybe clip poisson rates

3967c02

update to allow for configurable clipping via model

59e4a6e

fix clipping

665764d

add test for clipping

5fbf39d

drop extraneous tensorlib calls

eec1b82

matthewfeickert force-pushed the feat/clipPoissonRates branch from 3f35195 to eec1b82 Compare June 20, 2022 22:51

matthewfeickert approved these changes Jun 21, 2022

View reviewed changes

src/pyhf/pdf.py Show resolved Hide resolved

src/pyhf/pdf.py Show resolved Hide resolved

alexander-held reviewed Jun 21, 2022

View reviewed changes

add warning

c72af40

matthewfeickert merged commit 474c7e3 into master Jun 22, 2022

matthewfeickert deleted the feat/clipPoissonRates branch June 22, 2022 23:05

matthewfeickert mentioned this pull request Jun 23, 2022

build: Update lower bound on iminuit to v2.7.0 #1895

Merged

4 tasks

This was referenced Jan 10, 2023

Catch negative yields as likely source of NaN likelihoods in error message #2093

Open

Adding custom modifiers to cabinetry scikit-hep/cabinetry#382

Open

alexander-held mentioned this pull request Jul 27, 2023

Fit crashes when there is negative entry scikit-hep/cabinetry#424

Closed

feat: Allow for clipping negative yields in the main model #1845

feat: Allow for clipping negative yields in the main model #1845

Uh oh!

Conversation

kratsg commented Apr 8, 2022 • edited by matthewfeickert Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist Before Requesting Reviewer

Before Merging

Uh oh!

codecov bot commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kratsg commented Apr 20, 2022

Uh oh!

Uh oh!

kratsg commented Apr 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewfeickert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matthewfeickert commented Apr 21, 2022

Uh oh!

HDembinski commented Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewfeickert commented Apr 21, 2022

Uh oh!

HDembinski commented Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kratsg commented Apr 21, 2022

Uh oh!

kratsg commented Apr 21, 2022

Uh oh!

HDembinski commented Apr 21, 2022

Uh oh!

kratsg commented Apr 26, 2022

Uh oh!

alexander-held commented Apr 26, 2022

Uh oh!

alexander-held commented Jun 13, 2022

Uh oh!

matthewfeickert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexander-held left a comment

Choose a reason for hiding this comment

Uh oh!

mdhank commented Jun 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewfeickert commented Jun 22, 2022

Uh oh!

mdhank commented Jun 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kratsg commented Apr 8, 2022 •

edited by matthewfeickert

Loading

codecov bot commented Apr 8, 2022 •

edited

Loading

kratsg commented Apr 20, 2022 •

edited

Loading

HDembinski commented Apr 21, 2022 •

edited

Loading

HDembinski commented Apr 21, 2022 •

edited

Loading

mdhank commented Jun 21, 2022 •

edited

Loading

mdhank commented Jun 23, 2022 •

edited

Loading