Releases · pytorch/test-infra

30 Sep 22:30

v20250930-222836

e936529

v20250930-222836 Latest

Latest

[autorever] exclude unstable jobs (#7260)

Assets 13

benchmark-regression-summary-report.zip

sha256:f40f33abdc68b2b1505ef2d5482f55993b9dcb7ba1c827385927bad4ca36881e

31.7 MB 2025-09-30T22:29:04Z
benchmark-results-uploader.zip

sha256:f2218517297dd259cc81b51b3cb1972b8acc6c0e4b7f4e821fd97e26735432e7

14.5 MB 2025-09-30T22:28:59Z
buildkite-webhook-handler.zip

sha256:487295263bc97129522a3baf1527fd3f11f26862ef2f78bef62d965fd87a5574

14.5 MB 2025-09-30T22:29:01Z
ci-queue-pct.zip

sha256:9e2ddf1dfeea6f72d248ee7b359e605bbc7a3fbb258181d8fc9ebaded5e43d44

17.6 MB 2025-09-30T22:29:02Z
keep-going-call-log-classifier.zip

sha256:966c9190a52976a13d4bbc5d8d2b6b0d2cee8259fe855585981ecf04fa92033b

659 Bytes 2025-09-30T22:28:49Z
oss-ci-cur.zip

sha256:62df76981d5af7bb75b1b4c83bd11feca22028c3cb25d8f5520a02853fcc92bd

22.8 MB 2025-09-30T22:28:59Z
oss-ci-job-queue-time.zip

sha256:24fac6791624590d1d8eb05beb4980fa37f3be25cec1ac69488e7bd5b14c31bc

31.4 MB 2025-09-30T22:29:00Z
pytorch-auto-revert.zip

sha256:b5433b72b853e39b384a76115756b1361db1dd1e27889eeff9a92c643b53b773

31.4 MB 2025-09-30T22:29:01Z
runner-binaries-syncer.zip

sha256:49f05e5f1c2f152f426fbb7137dfa89589b7eb01106625d2fc1aa47c0bd83351

633 KB 2025-09-30T22:30:08Z
runners.zip

sha256:13ec3b518a845ffd6e06e306400ca6b205d3a720bd8df97debf312ca70ff7431

986 KB 2025-09-30T22:30:08Z
Source code (zip)

2025-09-30T22:28:12Z
Source code (tar.gz)

2025-09-30T22:28:12Z

30 Sep 13:45

github-actions

v20250930-134331

99554ad

v20250930-134331

[AUTOREVERT] [BUGFIX] fixing typo in variable name preventing revert …

Assets 13

30 Sep 12:59

github-actions

v20250930-125800

53c6bdf

v20250930-125800

[autorevert] correctly fetch and build the gaps in the signal (#7248)

1. Fixed commits-without-jobs issue

- Problem: Commits with no workflow jobs (e.g., periodic workflow) were
excluded from signal extraction
  - Solution:
    - Added fetch_commits_in_time_range() to query push table directly
- Modified job query to filter by explicit list of head_shas instead of
JOIN
- Changed ORDER BY to use sha dimension first (preserves grouping,
actual order doesn't matter as internally extractors now iterate over
the list of commits passed explicitly)


  2. Added mandatory timestamp field to SignalCommit

  - Changes:
- SignalCommit.__init__(head_sha, timestamp, events) - timestamp is now
mandatory
    - Signal extraction populates timestamps from push table
- HUD state logger uses commit timestamp instead of computing from event
times
    - Updated 36 test constructor calls
    
    
    
  ### Testing
  
  Before:
  

[2025-09-29T19-29-47.670686-00-00.html](https://github.com/user-attachments/files/22606856/2025-09-29T19-29-47.670686-00-00.html)


After:

[2025-09-29T21-38-10.190584-00-00.html](https://github.com/user-attachments/files/22606859/2025-09-29T21-38-10.190584-00-00.html)

Assets 13

29 Sep 23:10

github-actions

v20250929-230908

44b32da

v20250929-230908

[autorever] fix indentation in `fetch_tests_for_job_ids` (#7250)

Accidentally noticed another bug introduced by
https://github.com/pytorch/test-infra/pull/7241 when testing locally on
the large lookback windows:

```
python -m pytorch_auto_revert --dry-run autorevert-checker periodic --hours 256 --bisection-limit 2   --hud-html
2025-09-29 15:56:16,356 INFO [root] [v2] Start: workflows=periodic hours=256 repo=pytorch/pytorch restart_action=log revert_action=log notify_issue_number=163650
2025-09-29 15:56:16,356 INFO [root] [v2] Run timestamp (CH log ts) = 2025-09-29T22:56:16.356213+00:00
2025-09-29 15:56:16,356 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching commits in time range: repo=pytorch/pytorch lookback=256h
2025-09-29 15:56:16,909 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Commits fetched: 419 commits in 0.55s
2025-09-29 15:56:16,909 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching jobs: repo=pytorch/pytorch workflows=periodic commits=419 lookback=256h
2025-09-29 15:56:56,850 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Jobs fetched: 2848 rows in 39.94s
2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching tests for 1077 job_ids (453 failed jobs) in batches
2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Test batch 1/2 (size=1024)
2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] existing rows: 0
2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Test batch 2/2 (size=53)
2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] existing rows: 0
2025-09-29 15:56:57,718 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Tests fetched: 265 rows for 1077 job_ids in 0.86s
```

notice, that no tests are read in the first batch!


after this fix:
```
python -m pytorch_auto_revert --dry-run autorevert-checker periodic --hours 256   --hud-html
2025-09-29 16:03:06,896 INFO [root] [v2] Start: workflows=periodic hours=256 repo=pytorch/pytorch restart_action=log revert_action=log notify_issue_number=163650
2025-09-29 16:03:06,896 INFO [root] [v2] Run timestamp (CH log ts) = 2025-09-29T23:03:06.896595+00:00
2025-09-29 16:03:06,897 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching jobs: repo=pytorch/pytorch workflows=periodic lookback=256h
2025-09-29 16:03:49,456 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Jobs fetched: 2887 rows in 42.56s
2025-09-29 16:03:49,466 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching tests for 1113 job_ids (454 failed jobs) in batches
2025-09-29 16:03:49,466 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Test batch 1/2 (size=1024)
2025-09-29 16:03:51,753 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Test batch 2/2 (size=89)
2025-09-29 16:03:53,056 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Tests fetched: 5002 rows for 1113 job_ids in 3.59s
2025-09-29 16:03:53,122 INFO [root] [v2] Extracted 144 signals
```

Assets 13

29 Sep 19:24

github-actions

v20250929-192231

b6d478f

v20250929-192231

[autorevert] fix local cli (#7244)

Before:

```
(venv) ivanzaitsev@ivanzaitsev-mbp pytorch-auto-revert % python -m pytorch_auto_revert hud
2025-09-29 12:12:37,159 WARNING [pytorch_auto_revert.clickhouse_client_helper] Connection test failed: HTTPDriver for https://hyt81izu0c.us-east-1.aws.clickhouse.cloud:8443 received ClickHouse error code 516
 Code: 516. DB::Exception: revert_lambda: Authentication failed: password is incorrect, or there is no user with such name. (AUTHENTICATION_FAILED) (version 25.6.2.6151 (official build))

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/ivanzaitsev/test-infra/aws/lambda/pytorch-auto-revert/pytorch_auto_revert/__main__.py", line 336, in <module>
    main()
  File "/Users/ivanzaitsev/test-infra/aws/lambda/pytorch-auto-revert/pytorch_auto_revert/__main__.py", line 283, in main
    raise RuntimeError(
RuntimeError: ClickHouse connection test failed. Please check your configuration.
```

After:
```
(venv) ivanzaitsev@ivanzaitsev-mbp pytorch-auto-revert %
(venv) ivanzaitsev@ivanzaitsev-mbp pytorch-auto-revert %
(venv) ivanzaitsev@ivanzaitsev-mbp pytorch-auto-revert %
(venv) ivanzaitsev@ivanzaitsev-mbp pytorch-auto-revert %
(venv) ivanzaitsev@ivanzaitsev-mbp pytorch-auto-revert % python -m pytorch_auto_revert hud
2025-09-29 12:18:23,118 INFO [root] [hud] Fetching run state ts=2025-09-29 19:13:18 repo=<any>
2025-09-29 12:18:23,521 INFO [root] [hud] Loaded state for repo=pytorch/pytorch workflows=Lint,trunk,pull,inductor
2025-09-29 12:18:23,521 INFO [root] [hud] Rendering HTML for repo=pytorch/pytorch workflows=Lint,trunk,pull,inductor lookback=16 → 2025-09-29_19-13-18.html
2025-09-29 12:18:23,523 INFO [root] HUD written to 2025-09-29_19-13-18.html
(venv) ivanzaitsev@ivanzaitsev-mbp pytorch-auto-revert %
```

Assets 13

29 Sep 18:47

github-actions

v20250929-184550

60e16ae

v20250929-184550

[PYTORCHBOT] adds 'autorevert' classification for reverts (#7242)

Autorevert should issue revert commands in the format `&pytorchbot
revert -m "message" -c autorevert`

this change enables pytorchbot to accept this classification

Assets 12

29 Sep 18:30

github-actions

v20250929-182904

73efcae

v20250929-182904

[autorevert] fix RetryWithBackoff, add tests (#7243)

a followup to https://github.com/pytorch/test-infra/pull/7241

fixes the logic and adds unit tests

Assets 13

29 Sep 16:18

github-actions

v20250929-161641

aa5c240

v20250929-161641

[AUTOREVERT] use secret store over environment variables for password…

Assets 13

29 Sep 16:01

github-actions

v20250929-155929

6ec1bf7

v20250929-155929

[AUTOREVERT] Add retry with back-off for GH API and CH (#7241)

Just going on the code, finding where we call external API, and adding a
retry with exponential back-off.

Defaults to 5 retries, 0.5s base and with 10% jitter

There are NO CODE CHANGES, all parts of the code that are relevant are
being guardrailed with:

```
for attempt in RetryWithBackoff():
    with attempt:
        # the code 
```

Changes appear to be big due:

* Extra tabs and the consequent linter changes
* Lazy nature of the gh and ch libraries, that resolve pagination as the
code consume information

Assets 13

29 Sep 12:42

github-actions

v20250929-124114

06985bf

v20250929-124114

[autorevert] fix handling for insufficient successes (#7235)

Previously the code was trying to group branches for restarts resulting
from "infra check" and from "insufficient events", and this was a
mistake, resulting in delayed restarts.

Specifically, in this situation:
<img width="999" height="747" alt="image"
src="https://github.com/user-attachments/assets/9cd0051e-8d87-4fe2-af90-88a776847c4d"
/>
a restart on the success side is expected, but the system waits for
pending job on the failure side.


This PR decouples and simplifies the logic. Now, all restarts are
scheduled independently (relying on set deduplication) and all final
checks are performed afterwards.

Added a unit test to specifically verify the case above.

Assets 13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: pytorch/test-infra

v20250930-222836

Uh oh!

v20250930-134331

Uh oh!

v20250930-125800

Uh oh!

v20250929-230908

Uh oh!

v20250929-192231

Uh oh!

v20250929-184550

Uh oh!

v20250929-182904

Uh oh!

v20250929-161641

Uh oh!

v20250929-155929

Uh oh!

v20250929-124114

Uh oh!