[v0.11.0-dev][misc]change default capture size for Qwen3-MoE when using full dp #4205
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it?
Currently, the default
cudagraph_capture_sizein vLLM is[1, 2, 4 ,8 ,16 ,24 ,... , max_capture_size]. However, this is not always the best choice on different situations. This PR aims to change the default setting when running Qwen3-MoE on full dp (dp_size > 1&&tp_size == 1) setting, which is usually applied in Large-Scale EP.old :
[1, 2, 4 ,8 ,16 ,24 ,... , max_capture_size]new:
[1, 2, 5 ,10 ,15, 16 ,24 ,... , max_capture_size]This is mainly because the performance of
_npu_paged_attentionop degrades dramatically on old settings. We hope to provide better performance if users do not set specificcudagraph_capture_size.Does this PR introduce any user-facing change?
The default
cudagraph_capture_sizeis modified in above cases. However, ifcudagraph_capture_sizehas already set by users, this PR won't have any influence on this.How was this patch tested?