Skip to content

Conversation

@kratsg
Copy link
Contributor

@kratsg kratsg commented Apr 8, 2022

Description

Provides functionality for clipping expected_data in the _MainModel via pyhf.Model(clip_sample_data=0.0, clip_bin_data=0.0) keyword arguments. Added to the docstrings.

Backwards functionality is preserved, which is that clipping is not done. There does not seem to be a performance hit in checking for clipping or not. Clipping can be done at two levels:

  • the level at which we have the individual samples (perhaps useful if we need the granularity and a specific sample is providing large negative yields)
  • the level at which we have the bin-by-bin data (this is primarily the expected use case that makes the most sense? Maybe...)

Note that it seems ROOT primarily goes for the level of the individual samples, so adding both options and leaving it up to the user to decide if they want to use this. If you have all but one sample which is ok, and one sample providing large negative yields, you can either zero out the sample, or zero out the bin.

See documentation changes here: https://pyhf--1845.org.readthedocs.build/en/1845/_generated/pyhf.pdf.Model.html#pyhf.pdf.Model

Checklist Before Requesting Reviewer

  • Tests are passing
  • "WIP" removed from the title of the pull request
  • Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

  • Summarize commit messages into a comprehensive review of the PR
* API change to allow for the _MainModel to clip expected data at the sample-wise
data, or bin-wise data levels.
   - Should help make Minuit a little more robust in particular cases where the
     workspace provided has regions of phase-space where the model is not
     well-behaved (negative yields).
   - Add warning to initialization of _MainModel if clipping is used to ensure that users are
     aware of what they are doing.
* Add tests for clipping by-sample and clipping by-bin.

@codecov
Copy link

codecov bot commented Apr 8, 2022

Codecov Report

Merging #1845 (c72af40) into master (86229ad) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1845   +/-   ##
=======================================
  Coverage   98.17%   98.17%           
=======================================
  Files          68       68           
  Lines        4322     4331    +9     
  Branches      726      730    +4     
=======================================
+ Hits         4243     4252    +9     
  Misses         46       46           
  Partials       33       33           
Flag Coverage Δ
contrib 26.34% <16.66%> (-0.06%) ⬇️
doctest 60.63% <33.33%> (-0.11%) ⬇️
unittests-3.10 96.05% <100.00%> (+<0.01%) ⬆️
unittests-3.7 96.03% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pyhf/pdf.py 97.86% <100.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86229ad...c72af40. Read the comment docs.

@kratsg kratsg force-pushed the feat/clipPoissonRates branch from 048830d to 7093bdb Compare April 11, 2022 18:13
@kratsg kratsg force-pushed the feat/clipPoissonRates branch from 7093bdb to be63893 Compare April 20, 2022 19:32
@kratsg kratsg self-assigned this Apr 20, 2022
@kratsg kratsg added feat/enhancement New feature or request research experimental stuff optimization API Changes the public API user request Request coming form a pyhf user labels Apr 20, 2022
@kratsg kratsg marked this pull request as ready for review April 20, 2022 20:10
@kratsg
Copy link
Contributor Author

kratsg commented Apr 20, 2022

./cc @mdhank

@kratsg
Copy link
Contributor Author

kratsg commented Apr 20, 2022

@HDembinski - this is a possible interesting edge-case for Minuit. Minuit has issues with convergence on a (admittedly hacked-together) workspace that throws negative yields. If we explicitly add clipping to make sure the data is non-negative during minimization, there's no issue. (SciPy seems to be "robust" regardless).

Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kratsg Overall LGTM. 👍 Thanks!

I just have a question on the tolerances, but beyond that this seems good. Also is there any associated GitHub Issue or Discussion for this? Or was this only brought to our attention through email?

@matthewfeickert matthewfeickert added the tests pytest label Apr 21, 2022
@matthewfeickert
Copy link
Member

(We can ignore that Semantic Pull Request isn't reporting at the moment given zeke/semantic-pull-requests#183 and just visually check. This PR is fine. 👍)

@HDembinski
Copy link
Member

HDembinski commented Apr 21, 2022

This is not a Minuit issue. You should not have negative predictions for bin entries in a maximum likelihood fit, ever. Negative expectations are mathematically forbidden. Construct the model and limit the model parameters so that this never happens. If scipy handles this better, it is by chance.

The main source for negative expectations are: using Chebychev polynomials for background instead of Bernstein polynomials and allowing signal or background fractions that are negative.

There is a related study in the iminuit docs about using scipy as an optimisier and a constrained fit. I can't link it here because I am on the phone.

@matthewfeickert
Copy link
Member

There is a related study in the iminuit docs about using scipy as an optimisier and a constrained fit. I can't link it here because I am on the phone.

(Linking here before I sign off for the night) I think the study is this one: SciPy minimizers and constraints (which is the scipy_and_constraints.ipynb notebook in the repo).

@HDembinski
Copy link
Member

HDembinski commented Apr 21, 2022

@matthewfeickert Thank you for adding the link, yes, that's the one I was referring to.

To add to my previous comment: this workaround is not going to work well, because by clipping the predictions you introduce discontinuities in the cost function, but Minuit and other gradient-based minimisers expect that the cost function is analytical, an infinitesimal changes in the parameters produces an infinitesimal change in the value of the cost function. When you clip, this is no longer the case.

You may think that you can use a gradient-free minimiser as a way out, but that does not work either, in general. If the algorithm converges to a minimum that has some of the predictions clipped, then the estimates found are no longer maximum-likelihood estimates (because you didn't mathematically maximize a likelihood function). In general, this means you loose one of the key properties of a maximum-likelihood fit, that the covariance matrix of the fitted parameters can be computed from the matrix of second derivatives (if second derivatives can be computed at all despite the discontinuities of the cost function). You are only ok, if the minimum that is found does not have any of the predictions clipped.

@kratsg
Copy link
Contributor Author

kratsg commented Apr 21, 2022

@HDembinski , In general, I would agree that well-designed likelihood should not spit out negative yields. I think it happens when you have a lot of shape systematics that are uncorrelated but pulled in the same direction. I think limiting the model parameters manually would make the most sense if your model is simple enough (some of the issues from within ATLAS correspond with models with > 100 parameters, so it becomes slightly more intractable to hunt down all problematic parameters).

In ROOT, I will point out that this is enabled by default, which means ROOT is using Minuit with the forced positive-definite. I think it would be useful to understand why SciPy seems to be able to converge in these cases while Minuit [which is more strict and careful] will rightfully complain.

Does this indicate, even with the comparison studies you've done, that SciPy is wrong and should be more strict? Or is the argument you're making that for poorly-defined workspaces, you throw everything out of the window when it comes to comparing Minuit and SciPy?

@kratsg
Copy link
Contributor Author

kratsg commented Apr 21, 2022

To add to my previous comment: this workaround is not going to work well, because by clipping the predictions you introduce discontinuities in the cost function, but Minuit and other gradient-based minimisers expect that the cost function is analytical, an infinitesimal changes in the parameters produces an infinitesimal change in the value of the cost function. When you clip, this is no longer the case.

Yeah, this is actually the reason why I and others like @alexander-held are surprised to see this turned on by default in ROOT, so we're not entirely sure why. We can add specific documentation in the docstring to say that this option is "dangerous" (or that they should be careful trusting any results that needs this option).

@HDembinski
Copy link
Member

@HDembinski , In general, I would agree that well-designed likelihood should not spit out negative yields. I think it happens when you have a lot of shape systematics that are uncorrelated but pulled in the same direction. I think limiting the model parameters manually would make the most sense if your model is simple enough (some of the issues from within ATLAS correspond with models with > 100 parameters, so it becomes slightly more intractable to hunt down all problematic parameters).

I think it is not so difficult, because most of the parameters are shape parameters. You can change those as you like, it will not generate negative bin expectations. You may get zero expectation in some bin, which is also bad, but easier to work around. You cannot work around negative expectations. The main problem is the use Chebychev polynomials for background and allowing amplitudes of pdfs to be negative. Both is a bad tradition that we need to break away from.

In ROOT, I will point out that this is enabled by default, which means ROOT is using Minuit with the forced positive-definite.

Can you please link to that code in ROOT, because I don't know what you mean? You talk about positive-definiteness, but that is a different thing from what I am talking about. Positive-definiteness means that the curvature of the cost function is positive in all directions. The matrix of second derivatives has to be positive-definite for Minuit/Migrad both in ROOT and in iminuit.

I think it would be useful to understand why SciPy seems to be able to converge in these cases while Minuit [which is more strict and careful] will rightfully complain.

Minuit's Migrad algorithm is not more strict and careful. On the contrary, Migrad as implemented in Minuit2, is less robust than some of the algorithms in SciPy in my experience. Migrad cannot recover well from situations where the cost function returns NaN. I think there is a potential for improvement there, but it is not an easy fix.

Does this indicate, even with the comparison studies you've done, that SciPy is wrong and should be more strict? Or is the argument you're making that for poorly-defined workspaces, you throw everything out of the window when it comes to comparing Minuit and SciPy?

My comparisons indicate that sometimes Migrad works better than other minimisers and sometimes the other minimisers work better. It depends on the specific problem. My argument is that people need to define their models properly and there is little that you can do on the level of pyhf to fix bad models constructed by users. Another (unrelated) common problem is that users generate models with strongly correlated parameters. There is also nothing you can do in pyhf to prevent that. You should educate your users about this as I am doing.

@kratsg
Copy link
Contributor Author

kratsg commented Apr 26, 2022

Can you please link to that code in ROOT, because I don't know what you mean?

Hans, thanks for being patient and sorry for not responding as quickly. I will quote the email from @alexander-held but this was pointed out by @mdhank who found the relevant code/digging

Michael pointed me to https://github.com/root-project/root/blob/4c669d4ead82c3b328eaebc7f160a119c4dad31d/roofit/histfactory/src/PiecewiseInterpolation.cxx#L297-L305. ROOT implements a way to force positive-definite predictions. This seems to be used by default: in HistoToWorkspaceFactoryFast::MakeLinInterpWithConstraint [1], which is used in HistoToWorkspaceFactoryFast::MakeSingleChannelWorkspace [2], which is called via HistoToWorkspaceFactoryFast::MakeSingleChannelModel from RooStats::HistFactory::MakeModelAndMeasurementFast. It naively does not look configurable to me either, so I assume most/all of HistFactory models in ATLAS would be using this.

[1] https://github.com/root-project/root/blob/4c669d4ead82c3b328eaebc7f160a119c4dad31d/roofit/histfactory/src/HistoToWorkspaceFactoryFast.cxx#L468
[2] https://github.com/root-project/root/blob/4c669d4ead82c3b328eaebc7f160a119c4dad31d/roofit/histfactory/src/HistoToWorkspaceFactoryFast.cxx#L1318

You talk about positive-definiteness, but that is a different thing from what I am talking about. Positive-definiteness means that the curvature of the cost function is positive in all directions. The matrix of second derivatives has to be positive-definite for Minuit/Migrad both in ROOT and in iminuit.

I agree. I'm probably making this a bit confusing because this is the name that ROOT seems to have chosen, but looking at the code, it doesn't seem to be related to derivatives or the cost function - but rather simply just the yields. In the case when I'm referring to "positive-definite" - I'm referring to the phrasing ROOT is using, rather than my understanding of what it means in minimization which is what you explained. It's the reason why I chose a different name for the option in pyhf, because this isn't changing or touching the second-derivative.

Minuit's Migrad algorithm is not more strict and careful. On the contrary, Migrad as implemented in Minuit2, is less robust than some of the algorithms in SciPy in my experience. Migrad cannot recover well from situations where the cost function returns NaN. I think there is a potential for improvement there, but it is not an easy fix.

Noted, I think I understand where you're coming from. I still would like to provide this functionality in pyhf to at least match what exists in ROOT, but keep it as non-default. I do want to add more verbose description in the documentation to clarify the expectations of the minimizers and the expectations of the users, to make sure people understand that it is not up to pyhf to enforce that workspaces are well-defined, but we'll do our best.

@alexander-held
Copy link
Member

I think it is not so difficult, because most of the parameters are shape parameters. You can change those as you like, it will not generate negative bin expectations.

In the context of pyhf, "shape" variations are encoded by histosys modifiers which leave the total yield across all bins in a channel unchanged. If the shape is sufficiently strong, it can lead to scenarios where a change of the related parameter can drive the predicted yield in some of the bins to be negative (compensating for increases in yield in other bins). An example for that is in this gist for the case of two histosys modifiers, the same also can happen for a single histosys modifier if the parameter is pulled beyond one sigma. A way to address this in practice is the use of normsys modifiers that provide exponential extrapolation, but any remaining shape effects can still cause this behavior in principle.

@alexander-held
Copy link
Member

#1884 might also be of interest in this context: it's not specifically about protecting against negative yields, but protecting against NaNs due to ill-defined templates. I don't know where/how exactly ROOT protects against this.

@matthewfeickert matthewfeickert force-pushed the feat/clipPoissonRates branch from 3f35195 to eec1b82 Compare June 20, 2022 22:51
Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still would like to provide this functionality in pyhf to at least match what exists in ROOT, but keep it as non-default. I do want to add more verbose description in the documentation to clarify the expectations of the minimizers and the expectations of the users, to make sure people understand that it is not up to pyhf to enforce that workspaces are well-defined, but we'll do our best.

@kratsg (thanks for your patience and I hope that I never do such a poor job of responding to you again) I think this is a reasonable path forward for now. Though to avoid more delays on this I think making a dedicated docs Issue for the descriptions/discussions would be a good idea.

Copy link
Member

@alexander-held alexander-held left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested this very systematically, but have used this for a few problematic workspaces in the past few weeks and have not run into issues with the implementation. It does achieve the intended effect in the examples I have run.

@mdhank
Copy link

mdhank commented Jun 21, 2022

I have also tested this some on my workspaces, I've found it does seem to work as intended in my cases. Interestingly, it seems clipping bins might improve stability more than clipping samples; I've run into some cases where the former works but not the latter.

@matthewfeickert
Copy link
Member

I have also tested this some on my workspaces, I've found it does seem to work as intended in my cases. Interestingly, it seems clipping bins might improve stability more than clipping samples; I've run into some cases where the former works but not the latter.

@mdhank Thanks for testing this out. I'm quite behind on things at the moment (this PR is testimony to that) but if you have examples of where this is helping can you share them?

@matthewfeickert matthewfeickert merged commit 474c7e3 into master Jun 22, 2022
@matthewfeickert matthewfeickert deleted the feat/clipPoissonRates branch June 22, 2022 23:05
matthewfeickert added a commit that referenced this pull request Jun 23, 2022
* Update lower bound on iminuit to v2.7.0 as it is the 'oldest' SemVer release
that has the behavior:
> Minuit.hesse no longer sets the status of the FunctionMinimum unconditionally
> to valid if it was invalid before.

Without this fix, the situation can arise where `Minuit.hesse()` can cause
`Minuit.valid` to be True when it should be False.
* Update tests/constraints.txt to use iminuit==2.7.0.
* Amends PR #1845
@mdhank
Copy link

mdhank commented Jun 23, 2022

I have also tested this some on my workspaces, I've found it does seem to work as intended in my cases. Interestingly, it seems clipping bins might improve stability more than clipping samples; I've run into some cases where the former works but not the latter.

@mdhank Thanks for testing this out. I'm quite behind on things at the moment (this PR is testimony to that) but if you have examples of where this is helping can you share them?

The workspaces in question contain non-public ATLAS results, but I have sent versions of them to the pyhf developers via e-mail in March-April.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Changes the public API feat/enhancement New feature or request optimization research experimental stuff tests pytest user request Request coming form a pyhf user

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants