Releases: pytorch/test-infra
Releases · pytorch/test-infra
v20251003-173111
[AUTOREVERT] add env vars REVERT_ACTION and RESTART_ACTION so we can …
v20251002-175443
[AUTOREVERT] fix bug ignoring workflow dispatch errors (#7277) Signed-off-by: Jean Schmidt <[email protected]>
v20251002-162215
[autorevert] Inject synthetic PENDING events for pending workflows in…
v20251002-150750
[autorevert] Fix pacing query logic (#7274) `Any` has an unexpected semantics in CH, it returns [first value](https://clickhouse.com/docs/sql-reference/aggregate-functions/reference/any), the correct way to check if any value is true is to use `countIf`. The effect of this bug was that pacing was not working in some rare cases when there are multiple events for commit and some were not matching the condition. Basically, when the first event goes out of the window, and second event is added, we get two rows: 0 and 1, and depending on the random order either would be returned by `any`. The correct way (among many) would use `countIf` instead. Testing: ``` SELECT (countIf(failed = 0 AND ts > now() - toIntervalSecond(5200)) > 0) AS has_success_within_window, any(failed = 0 AND ts > now() - toIntervalSecond(5200)) AS has_success_within_window_old FROM misc.autorevert_events_v2 WHERE repo = 'pytorch/pytorch' AND action = 'restart' AND dry_run = 0 AND commit_sha = 'b5c4f46bb9ede8dc6adf11975c93b9f285d9ed67' ``` result: ``` "has_success_within_window","has_success_within_window_old" "1","0" ``` more testing: ``` python -m pytorch_auto_revert --dry-run autorevert-checker Lint trunk pull inductor rocm rocm-mi300 --hours 18 --hud-html ```
v20251001-182920
[autorevert] Add 'linux-aarch64' to default workflows (#7268) see the list of viable strict workflows: https://github.com/pytorch/pytorch/pull/164374/files testing: ``` HOURS=18 python -m pytorch_auto_revert --dry-run 2025-10-01 11:19:05,293 INFO [root] [v2] Start: workflows=Lint,trunk,pull,inductor,linux-aarch64 hours=18 repo=pytorch/pytorch restart_action=log revert_action=log notify_issue_number=163650 bisection=unlimited 2025-10-01 11:19:05,293 INFO [root] [v2] Run timestamp (CH log ts) = 2025-10-01T18:19:05.293306+00:00 2025-10-01 11:19:05,294 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching commits in time range: repo=pytorch/pytorch lookback=18h 2025-10-01 11:19:06,055 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Commits fetched: 47 commits in 0.76s 2025-10-01 11:19:06,055 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching jobs: repo=pytorch/pytorch workflows=Lint,trunk,pull,inductor,linux-aarch64 commits=47 lookback=18h 2025-10-01 11:20:14,477 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Jobs fetched: 7058 rows in 68.42s 2025-10-01 11:20:14,539 INFO [root] [v2] Extracted 1 signals 2025-10-01 11:20:14,539 INFO [root] [v2][signal] wf=inductor key=inductor-test / test outcome=Ineligible(reason=<IneligibleReason.FLAKY: 'flaky'>, message='signal is flaky (mixed outcomes on same commit)') 2025-10-01 11:20:14,539 INFO [root] [v2] Candidate action groups: 0 2025-10-01 11:20:14,539 INFO [root] [v2] Executed action groups: 0 2025-10-01 11:20:15,101 INFO [root] [v2] State logged ```
v20251001-181055
[autorevert] Implement autobisect functionality (#7238) Testing on the periodic workflow (on top of https://github.com/pytorch/test-infra/pull/7248): ``` python -m pytorch_auto_revert autorevert-checker periodic --hours 128 --bisection-limit 2 --hud-html python -m pytorch_auto_revert --dry-run autorevert-checker periodic --hours 256 --bisection-limit 2 --hud-html ``` [2025-09-29T22-00-27.941916-00-00.html](https://github.com/user-attachments/files/22607006/2025-09-29T22-00-27.941916-00-00.html) [2025-09-29T22-03-58.012711-00-00.html](https://github.com/user-attachments/files/22607013/2025-09-29T22-03-58.012711-00-00.html) ---- Algorithm: - Goal: Cover the “unknown” span between failure and success partitions by scheduling at most N new restarts, sampling widely via iterative bisection. - Intuition: Always split the largest unknown gap; choose its midpoint; repeat until the budget is exhausted. Inputs/Output - Input covered: boolean list over the unknown region - True = already covered/separator (e.g., pending), False = uncovered candidate. - Input limit: optional int; total target coverage for this run. - Budget allowed = max(0, limit − sum(covered)); None = unlimited. - Output: boolean list of equal length; True marks indices to newly cover (schedule now). Procedure - If limit is None: return NOT covered (select all uncovered). - Else: - Build contiguous uncovered gaps (sequences of False) separated by True entries. - Push each gap into a max-heap keyed by (-length, lo, hi) using Gap(lo, hi): - length = hi − lo + 1 - heap_key = (-length, lo, hi) for deterministic tie-breaking. - While allowed > 0 and heap not empty: - Pop largest gap g; pick mid = floor((g.lo + g.hi)/2); select mid; allowed -= 1. - Push back sub-gaps [g.lo, mid-1] and [mid+1, g.hi] if non-empty. - Return the selection mask. Properties - Deterministic ties (equal-length gaps) prefer lower lo. - Already-covered (pending) entries both reduce the budget and split gaps, pacing new work naturally. - If limit ≤ current_covered → allowed = 0 → no new selections. - Complexity: O(A log G), where A = number of picks (≤ allowed), G = initial number of gaps. Integration in signal processing - PartitionedCommits.cover_gap_unknown_commits: - Builds covered mask for the unknown partition: pending=True (separator), missing=False (candidate). - Calls the planner; maps selected indices back to commit SHAs to restart. - process_valid_autorevert_pattern(bisection_limit=...): - Applies gap-cover selections, then independently applies failure-/success-side restarts based on infra and threshold heuristics. --------- Co-authored-by: Copilot <[email protected]>
v20251001-180704
[AUTOREVERT] Makefile targets pointing to canary (#7267) Setting the makefile targets to point to `pytorch/pytorch-canary` as an example.
v20251001-163637
[autorevert] add job & hud links to the autorevert message and debug…
v20250930-222836
[autorever] exclude unstable jobs (#7260)
v20250930-134331
[AUTOREVERT] [BUGFIX] fixing typo in variable name preventing revert …