Quantifying Uncertainty in Error Consistency: Towards Reliable Behavioral Comparison of Classifiers

Paper • Requirements • Contributing • Citation

Quantifying Uncertainty in Error Consistency: Towards Reliable Behavioral Comparison of Classifiers

We show how to compute confidence intervals for the Error Consistency metric, which measures whether two classifiers make errors on the same samples. It has also been used to quantify the behavioral alignment between humans and DNN models of vision. Here, we introduce a new computational model of EC that phrases the metric in terms of the probability that one observer copies responses from the other. We revisit older results and find that model rankings based on the existing data are unreliable.

This repo contains code to compute confidence intervals for EC, conduct significance tests and plan sufficiently powerful experiments.

Requirements

This repo assumes a python version of at least 3.13. All other requirements are listed in requirements.txt and can just be pip-installed.

Contributing

Code Style

We use Black and isort for code formatting, and automate this using pre-commit. We keep notebooks clean using nbstripout.

To install requirements and apply the style:

pip install -r requirements.txt
pre-commit install
pre-commit run --all-files

If the CI fails for a PR you'd like to merge, it means that your changes don't adhere to the style guide. To fix this, run pre-commit run --all-files locally, then commit and push again. If black applies auto-formatting that is really nonsensical, you can force it to ignore a code block like this:

# fmt: off
print("...")
# fmt: on

Testing

We use pytest for unit testing. Install it with pip install pytest and then run pytest from the repository root to run all tests.

Citation

If you found our work useful, please consider citing our paper:

@article{klein2025quantifying,
  title={Quantifying Uncertainty in Error Consistency: Towards Reliable Behavioral Comparison of Classifiers},
  author={Klein, Thomas and Meyen, Sascha and Brendel, Wieland and Wichmann, Felix A and Meding, Kristof},
  journal={arXiv preprint arXiv:2507.06645},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
assets		assets
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quantifying Uncertainty in Error Consistency: Towards Reliable Behavioral Comparison of Classifiers

Requirements

Contributing

Code Style

Testing

Citation

About

Uh oh!

Releases

Packages

Languages

License

wichmann-lab/error_consistency

Folders and files

Latest commit

History

Repository files navigation

Quantifying Uncertainty in Error Consistency: Towards Reliable Behavioral Comparison of Classifiers

Requirements

Contributing

Code Style

Testing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages