GitHub actions regression test #499

MarkWolters · 2025-07-14T21:57:06Z

This PR introduces a new github workflow and associated code to perform automated regression testing across branches. The workflow can be triggered manually with the branches to compare included as inputs. It is also triggered to run automatically when a PR to main is opened, in which case it will run a regression test comparing the requested branch with main.

Metrics compared include QPR, average latency, and recall at k10. See attached for an example of the output.
benchmark_report.zip

marianotepper · 2025-07-15T13:26:01Z

visualize_benchmarks.py

+HIGHER_IS_BETTER = ["QPS", "Recall@10"]
+LOWER_IS_BETTER = ["Mean Latency"]
+
+class BenchmarkData:


I suggest leaving two blank lines between top-level functions and two blank lines between the import statements and other code: https://stackoverflow.com/questions/2953250/python-pep8-blank-lines-convention

Done and commited

marianotepper · 2025-07-15T13:33:54Z

jvector-examples/src/main/java/io/github/jbellis/jvector/example/AutoBenchYAML.java

@@ -0,0 +1,183 @@
+/*


Should we tidy up the main example directory? IMO, it is getting polluted. Maybe we should add a "tools" directory or similar. We can do a separate PR for this, no need to do it here as there are other files that should probably go there too and that have nothing to do with this PR.

Noted and agreed

marianotepper · 2025-07-15T13:36:48Z

I suggest editing the description of this PR with a high-level explanation of what it contains/does. It would be great if it also contained an example of the output.

MarkWolters added 30 commits June 24, 2025 13:00

Added optional artifact creation to Bench and github workflow

e91a497

summarizing test results

8f8cbc2

updating run command

83a19b7

force workflow on checkin for now

a891048

warn not fail on avx12 not available

5c0f4f4

fixed command line filter parsing

62666df

workflow compilation fix

c9c48b0

added license

38c5a56

workflow changes

bf17e02

adding main class to manifest for examples

e5e3cf3

adding results comparison

240cf58

switching main class from Bench to BenchYAML

2cceb54

adding ad hoc run functionality

ec9803a

adding ad hoc run functionality

2636414

bumping version to 5 or tests dont work

77acf27

upping jdk from 20 to 22

e98912b

just jdk24

10da16e

moving gha to its own class

e89725c

merging main into gha branch

63a5636

gha problem with python env

75dd9b8

update dataset references

6e97afb

trying to get the datasets recognized

6dc062b

reducing datasets

cae9991

only run cohere-english-v3-100k

9711bad

adding logging for debugging purposes

a9324d7

more debug logging

30334ae

changing default main class to autobenchyaml

8092bb1

cleaning up extraneous logging

e9022c4

bug fix on comparison

0ebc186

fixed metric comparison

00973fb

MarkWolters added 9 commits July 11, 2025 11:47

workflow updates

30e2d1c

workflow updates

819a108

workflow updates

7ea329c

support arbitrary branches

20212c3

support arbitrary branches

47205a7

support arbitrary branches

097d3f6

refactoring branch comparison

b818c35

separate viz job

be91159

final workflow and added datasets

cddfaa8

MarkWolters requested review from jshook, tlwillke and marianotepper July 14, 2025 21:57

marianotepper reviewed Jul 15, 2025

View reviewed changes

MarkWolters added 2 commits July 15, 2025 12:31

update viz script for Python style guidelines

b9b317b

updated workflow to plot comparison in order of latest checkin

d19cedf

MarkWolters marked this pull request as ready for review July 16, 2025 15:49

MarkWolters added 12 commits July 17, 2025 17:17

added machine props annotations

ec7dd32

fixed issue with checkout on ad hoc run

1ef47e4

adding heap dump for OOM errors

170de03

increasing max heap size

1b1177e

Merge branch 'main' into github_actions

557811c

updated workflow to detect memory instead of hardcoded 8G

d32529e

adding infratest bucket for dataset downloads

0ffac8b

fixed bug introduced by merging main preventing qps stats

c7cd34e

update workflow to only run smaller datasets by default

36fc30d

added checkpointing and cohere datasets

3b511a5

added script to compare across iterations

f7e72e3

added metric for index build time

05c8c78

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GitHub actions regression test #499

GitHub actions regression test #499

MarkWolters commented Jul 14, 2025 •

edited

Loading

Uh oh!

marianotepper Jul 15, 2025

Uh oh!

MarkWolters Jul 15, 2025

Uh oh!

marianotepper Jul 15, 2025 •

edited

Loading

Uh oh!

MarkWolters Jul 15, 2025

Uh oh!

marianotepper commented Jul 15, 2025

Uh oh!

Uh oh!

GitHub actions regression test #499

Are you sure you want to change the base?

GitHub actions regression test #499

Conversation

MarkWolters commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marianotepper Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MarkWolters Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

marianotepper Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarkWolters Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

marianotepper commented Jul 15, 2025

Uh oh!

Uh oh!

MarkWolters commented Jul 14, 2025 •

edited

Loading

marianotepper Jul 15, 2025 •

edited

Loading