Skip to content

Pull requests: NVIDIA/nvidia-resiliency-ext

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Auto restart ci-approved Approved to run CI
#139 opened Aug 6, 2025 by hexinw-nvidia Draft
checkpointing: fix error propagation and add test
#138 opened Aug 2, 2025 by diegs Loading…
Add example for multimodal models ci-approved Approved to run CI
#131 opened Jul 25, 2025 by Ava-A4098 Loading…
Add ability to temporarily disable hang protection ci-approved Approved to run CI
#129 opened Jul 23, 2025 by rhewett-nv Loading…
Inprocess doc clarification ci-approved Approved to run CI
#119 opened Jul 15, 2025 by rhewett-nv Loading…
Added in-process wrapper restart latency
#118 opened Jul 13, 2025 by namitdhameja Loading…
updating fork to spawn ci-approved Approved to run CI
#102 opened Jul 1, 2025 by aartibasant Loading…
Test UT. ci-approved Approved to run CI
#79 opened May 17, 2025 by hexinw-nvidia Draft
Use process group with compatible backends
#26 opened Apr 3, 2025 by ajayvohra2005 Loading…
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.