Skip to content

Commit 9127c8e

Browse files
mgoinAkshat-Tripathi
authored andcommitted
Fix CompressedTensorsWNA16MoE with grouped scales (vllm-project#13769)
1 parent 3660d88 commit 9127c8e

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -527,7 +527,8 @@ def marlin_moe_permute_scales(s: torch.Tensor, size_k: int,
527527
replace_tensor("w13_weight_scale", marlin_w13_scales)
528528
marlin_w2_scales = marlin_moe_permute_scales(
529529
layer.w2_weight_scale,
530-
layer.w2_weight_scale.shape[1] * self.packed_factor,
530+
layer.w2_weight_scale.shape[1] *
531+
(self.group_size if self.group_size != -1 else self.packed_factor),
531532
size_k2,
532533
self.group_size,
533534
self.num_bits,

0 commit comments

Comments
 (0)