Mismatch model shape
Hi there,
we are hitting below issue that when we are running the model against with MI300X using the suggested VLLM version.
It reported the data.shape and loaded weight are incorrect
assert param_data.shape == loaded_weight.shape
docker run -it
--device=/dev/kfd
--device=/dev/dri
--group-add video
--shm-size 16G
--security-opt seccomp=unconfined
--security-opt apparmor=unconfined
--cap-add=SYS_PTRACE
--env VLLM_ROCM_USE_AITER=1
--env VLLM_DISABLE_COMPILE_CACHE=1
-p 8000:8000
-d
rocm/vllm:rocm7.0.0_vllm_0.11.2_20251210
bash -c "
python3 -m vllm.entrypoints.openai.api_server
--model amd/MiniMax-M2.1-MXFP4
--gpu-memory-utilization 0.95
--max-model-len 196608
--kv-cache-dtype fp8
--enable-chunked-prefill false
--tool-call-parser minimax_m2
--reasoning-parser minimax_m2_append_think
--quantization quark
--trust_remote_code
--enable-auto-tool-choice
--host 0.0.0.0
--port 8000
any chance can have some suggestion how to fix it?
Hi, this is a model support issue in vLLM for MiniMax-M2.
Would you please add a patch by applying the packed_modules_mapping to MiniMaxM2Model?
Take this as a reference: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/minimax_vl_01.py#L182
Sorry for late update. But the patch you recommended does not work (not sure if I did something wrong)
Patched code
@support_torch_compile
class MiniMaxM2Model(nn.Module):
+ packed_modules_mapping = {
+ "qkv_proj": ["q_proj", "k_proj", "v_proj"],
+ "gate_up_proj": ["gate_proj", "up_proj"],
+ }
Tested docker version
rocm/vllm:rocm7.0.0_vllm0.11.2_20251210
rocm/vllm:v0.14.0_amd_dev
It is possible to have you patch so that you can run this model on MI3XX series? I think it is huge boost for other to onboard more for AMD MI3XX series