v1.0.1
Highlights
- DeepSeek-R1 Multi-Node H100 Deployment Support
 - FlashInfer Integration
 - XGrammer Integration
 
What's Changed
- Benchclient by @shihaobai in #740
 - fix pause reqs by @shihaobai in #741
 - add RETURN_LIST for tgi_api by @shihaobai in #742
 - fix: fix a precision bug in the context_flashattention by @blueswhen in #743
 - Improve the accuracy of deepseekv3 by @hiworldwzj in #744
 - deepseekv3 bmm noquant and fix moe gemm bug. by @hiworldwzj in #745
 - Add Xgrammar Support by @flyinglandlord in #701
 - fuse fp8 quant in kv copying and add flashinfer decode mla operator in the attention module by @blueswhen in #737
 - fix: add flashinfer-python in the requirements.txt by @blueswhen in #749
 - Fix tokens2 by @SangChengC in #748
 - Fix Unit-test in PR: Add xgrammar by @flyinglandlord in #750
 - add support for multinode tp by @shihaobai in #751
 
Full Changelog: v1.0.0...v1.0.1