Skip to content

Releases: pytorch/test-infra

v20251003-173111

03 Oct 17:32
3713fec
Compare
Choose a tag to compare
[AUTOREVERT] add env vars REVERT_ACTION and RESTART_ACTION so we can …

v20251002-175443

02 Oct 17:56
0ffb888
Compare
Choose a tag to compare
[AUTOREVERT] fix bug ignoring workflow dispatch errors (#7277)

Signed-off-by: Jean Schmidt <[email protected]>

v20251002-162215

02 Oct 16:24
5903a75
Compare
Choose a tag to compare
[autorevert] Inject synthetic PENDING events for pending workflows in…

v20251002-150750

02 Oct 15:09
e8dfdb6
Compare
Choose a tag to compare
[autorevert] Fix pacing query logic (#7274)

`Any` has an unexpected semantics in CH, it returns [first
value](https://clickhouse.com/docs/sql-reference/aggregate-functions/reference/any),
the correct way to check if any value is true is to use `countIf`.

The effect of this bug was that pacing was not working in some rare
cases when there are multiple events for commit and some were not
matching the condition.

Basically, when the first event goes out of the window, and second event
is added, we get two rows: 0 and 1, and depending on the random order
either would be returned by `any`.

The correct way (among many) would use `countIf` instead.

Testing:

```
  SELECT
  (countIf(failed = 0 AND ts > now() - toIntervalSecond(5200)) > 0) AS has_success_within_window,
    any(failed = 0 AND ts > now() - toIntervalSecond(5200)) AS has_success_within_window_old
  FROM misc.autorevert_events_v2
  WHERE repo = 'pytorch/pytorch'
  AND action = 'restart'
  AND dry_run = 0
  AND commit_sha = 'b5c4f46bb9ede8dc6adf11975c93b9f285d9ed67'
  ```
  
  result:
```
"has_success_within_window","has_success_within_window_old"
"1","0"
```



more testing:

```
python -m pytorch_auto_revert --dry-run autorevert-checker Lint trunk
pull inductor rocm rocm-mi300 --hours 18 --hud-html
```

v20251001-182920

01 Oct 18:31
2315118
Compare
Choose a tag to compare
[autorevert] Add 'linux-aarch64' to default workflows (#7268)

see the list of viable strict workflows:
https://github.com/pytorch/pytorch/pull/164374/files

testing:

```
HOURS=18 python -m pytorch_auto_revert --dry-run
2025-10-01 11:19:05,293 INFO [root] [v2] Start: workflows=Lint,trunk,pull,inductor,linux-aarch64 hours=18 repo=pytorch/pytorch restart_action=log revert_action=log notify_issue_number=163650 bisection=unlimited
2025-10-01 11:19:05,293 INFO [root] [v2] Run timestamp (CH log ts) = 2025-10-01T18:19:05.293306+00:00
2025-10-01 11:19:05,294 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching commits in time range: repo=pytorch/pytorch lookback=18h
2025-10-01 11:19:06,055 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Commits fetched: 47 commits in 0.76s
2025-10-01 11:19:06,055 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching jobs: repo=pytorch/pytorch workflows=Lint,trunk,pull,inductor,linux-aarch64 commits=47 lookback=18h
2025-10-01 11:20:14,477 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Jobs fetched: 7058 rows in 68.42s
2025-10-01 11:20:14,539 INFO [root] [v2] Extracted 1 signals
2025-10-01 11:20:14,539 INFO [root] [v2][signal] wf=inductor key=inductor-test / test outcome=Ineligible(reason=<IneligibleReason.FLAKY: 'flaky'>, message='signal is flaky (mixed outcomes on same commit)')
2025-10-01 11:20:14,539 INFO [root] [v2] Candidate action groups: 0
2025-10-01 11:20:14,539 INFO [root] [v2] Executed action groups: 0
2025-10-01 11:20:15,101 INFO [root] [v2] State logged
```

v20251001-181055

01 Oct 18:12
c7f01a8
Compare
Choose a tag to compare
[autorevert] Implement autobisect functionality (#7238)

Testing on the periodic workflow (on top of
https://github.com/pytorch/test-infra/pull/7248):

```
 python -m pytorch_auto_revert  autorevert-checker periodic --hours 128 --bisection-limit 2   --hud-html
 python -m pytorch_auto_revert --dry-run autorevert-checker periodic --hours 256 --bisection-limit 2   --hud-html
```


[2025-09-29T22-00-27.941916-00-00.html](https://github.com/user-attachments/files/22607006/2025-09-29T22-00-27.941916-00-00.html)


[2025-09-29T22-03-58.012711-00-00.html](https://github.com/user-attachments/files/22607013/2025-09-29T22-03-58.012711-00-00.html)




----


 Algorithm:

- Goal: Cover the “unknown” span between failure and success partitions
by scheduling at most N new restarts, sampling widely via iterative
bisection.
- Intuition: Always split the largest unknown gap; choose its midpoint;
repeat until the budget is exhausted.

  Inputs/Output

  - Input covered: boolean list over the unknown region
- True = already covered/separator (e.g., pending), False = uncovered
candidate.
  - Input limit: optional int; total target coverage for this run.
      - Budget allowed = max(0, limit − sum(covered)); None = unlimited.
- Output: boolean list of equal length; True marks indices to newly
cover (schedule now).

  Procedure

  - If limit is None: return NOT covered (select all uncovered).
  - Else:
- Build contiguous uncovered gaps (sequences of False) separated by True
entries.
- Push each gap into a max-heap keyed by (-length, lo, hi) using Gap(lo,
hi):
          - length = hi − lo + 1
          - heap_key = (-length, lo, hi) for deterministic tie-breaking.
      - While allowed > 0 and heap not empty:
- Pop largest gap g; pick mid = floor((g.lo + g.hi)/2); select mid;
allowed -= 1.
- Push back sub-gaps [g.lo, mid-1] and [mid+1, g.hi] if non-empty.
      - Return the selection mask.

  Properties

  - Deterministic ties (equal-length gaps) prefer lower lo.
- Already-covered (pending) entries both reduce the budget and split
gaps, pacing new work naturally.
  - If limit ≤ current_covered → allowed = 0 → no new selections.
- Complexity: O(A log G), where A = number of picks (≤ allowed), G =
initial number of gaps.

  Integration in signal processing

  - PartitionedCommits.cover_gap_unknown_commits:
- Builds covered mask for the unknown partition: pending=True
(separator), missing=False (candidate).
- Calls the planner; maps selected indices back to commit SHAs to
restart.
  - process_valid_autorevert_pattern(bisection_limit=...):
- Applies gap-cover selections, then independently applies
failure-/success-side restarts based on infra and threshold heuristics.

---------

Co-authored-by: Copilot <[email protected]>

v20251001-180704

01 Oct 18:08
3e1acbd
Compare
Choose a tag to compare
[AUTOREVERT] Makefile targets pointing to canary (#7267)

Setting the makefile targets to point to `pytorch/pytorch-canary` as an
example.

v20251001-163637

01 Oct 16:38
277f605
Compare
Choose a tag to compare
[autorevert]  add job & hud links to the autorevert message and debug…

v20250930-222836

30 Sep 22:30
e936529
Compare
Choose a tag to compare
[autorever] exclude unstable jobs (#7260)

v20250930-134331

30 Sep 13:45
99554ad
Compare
Choose a tag to compare
[AUTOREVERT] [BUGFIX] fixing typo in variable name preventing revert …