Add Heirarchical Performance Testing (HPT) technique to `compare_to`?

I recently came across a technique for distilling benchmark measurements into a single number that takes into account the fact that some benchmarks are more consistent/reliable than others, called [Heirarchical Performance Testing (HPT)](https://ieeexplore.ieee.org/document/6169043).  There is an implementation (in bash!!!) for the [PARSEC benchmark suite](https://github.com/cirosantilli/parsec-benchmark/blob/master/toolkit/hpt/hpt.sh).  I [ported it to Python](https://github.com/faster-cpython/bench_runner/pull/25) and ran it over the big [Faster CPython data set](https://github.com/faster-cpython/benchmarking-public/tree/hpt). 

The results are pretty useful -- for example, while a lot of the main specialization work in 3.11 has a reliability of 100%, some recent changes to the GC have a speed improvement but with a lower reliability, accounting for the fact that GC changes have a lot more randomness (more moving parts and interactions with other things happening in the OS).  I think this reliability number, along with the more stable "expected speedup at the 99th percentile", is a lot more useful for evaluating a change (especially small changes) than the geometric mean.  I did not, however, see the massive 3.5x discrepancy between the 99th percentile number and the geometric mean reported in the paper (on a different dataset).

Is there interest in adding this metric to the output of `pyperf`'s `compare_to` command?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Heirarchical Performance Testing (HPT) technique to `compare_to`? #168

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add Heirarchical Performance Testing (HPT) technique to compare_to? #168

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Add Heirarchical Performance Testing (HPT) technique to `compare_to`? #168