xuebi
commited on
Commit
·
6ff2856
1
Parent(s):
aa487ef
update docs
Browse files- docs/sglang_deploy_guide.md +111 -0
- docs/sglang_deploy_guide_cn.md +120 -0
- docs/transformers_deploy_guide.md +92 -0
- docs/transformers_deploy_guide_cn.md +93 -0
- docs/vllm_deploy_guide.md +117 -0
- docs/vllm_deploy_guide_cn.md +127 -0
docs/sglang_deploy_guide.md
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MiniMax M2.5 Model SGLang Deployment Guide
|
| 2 |
+
|
| 3 |
+
[English Version](./sglang_deploy_guide.md) | [Chinese Version](./sglang_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
+
We recommend using [SGLang](https://github.com/sgl-project/sglang) to deploy the [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) model. SGLang is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing SGLang's official documentation to check hardware compatibility before deployment.
|
| 6 |
+
|
| 7 |
+
## Applicable Models
|
| 8 |
+
|
| 9 |
+
This document applies to the following models. You only need to change the model name during deployment.
|
| 10 |
+
|
| 11 |
+
- [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5)
|
| 12 |
+
- [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1)
|
| 13 |
+
- [MiniMaxAI/MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)
|
| 14 |
+
|
| 15 |
+
The deployment process is illustrated below using MiniMax-M2.5 as an example.
|
| 16 |
+
|
| 17 |
+
## System Requirements
|
| 18 |
+
|
| 19 |
+
- OS: Linux
|
| 20 |
+
|
| 21 |
+
- Python: 3.9 - 3.12
|
| 22 |
+
|
| 23 |
+
- GPU:
|
| 24 |
+
|
| 25 |
+
- compute capability 7.0 or higher
|
| 26 |
+
|
| 27 |
+
- Memory requirements: 220 GB for weights, 240 GB per 1M context tokens
|
| 28 |
+
|
| 29 |
+
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 30 |
+
|
| 31 |
+
- **96G x4** GPU: Supports a total KV Cache capacity of 400K tokens.
|
| 32 |
+
|
| 33 |
+
- **144G x8** GPU: Supports a total KV Cache capacity of up to 3M tokens.
|
| 34 |
+
|
| 35 |
+
> **Note**: The values above represent the total aggregate hardware KV Cache capacity. The maximum context length per individual sequence remains **196K** tokens.
|
| 36 |
+
|
| 37 |
+
## Deployment with Python
|
| 38 |
+
|
| 39 |
+
It is recommended to use a virtual environment (such as **venv**, **conda**, or **uv**) to avoid dependency conflicts.
|
| 40 |
+
|
| 41 |
+
We recommend installing SGLang in a fresh Python environment:
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
uv venv
|
| 45 |
+
source .venv/bin/activate
|
| 46 |
+
uv pip install sglang
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
Run the following command to start the SGLang server. SGLang will automatically download and cache the MiniMax-M2.5 model from Hugging Face.
|
| 50 |
+
|
| 51 |
+
4-GPU deployment command:
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
python -m sglang.launch_server \
|
| 55 |
+
--model-path MiniMaxAI/MiniMax-M2.5 \
|
| 56 |
+
--tp-size 4 \
|
| 57 |
+
--tool-call-parser minimax-m2 \
|
| 58 |
+
--reasoning-parser minimax-append-think \
|
| 59 |
+
--host 0.0.0.0 \
|
| 60 |
+
--trust-remote-code \
|
| 61 |
+
--port 8000 \
|
| 62 |
+
--mem-fraction-static 0.85
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
8-GPU deployment command:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
python -m sglang.launch_server \
|
| 69 |
+
--model-path MiniMaxAI/MiniMax-M2.5 \
|
| 70 |
+
--tp-size 8 \
|
| 71 |
+
--ep-size 8 \
|
| 72 |
+
--tool-call-parser minimax-m2 \
|
| 73 |
+
--trust-remote-code \
|
| 74 |
+
--host 0.0.0.0 \
|
| 75 |
+
--reasoning-parser minimax-append-think \
|
| 76 |
+
--port 8000 \
|
| 77 |
+
--mem-fraction-static 0.85
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
## Testing Deployment
|
| 81 |
+
|
| 82 |
+
After startup, you can test the SGLang OpenAI-compatible API with the following command:
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
curl http://localhost:8000/v1/chat/completions \
|
| 86 |
+
-H "Content-Type: application/json" \
|
| 87 |
+
-d '{
|
| 88 |
+
"model": "MiniMaxAI/MiniMax-M2.5",
|
| 89 |
+
"messages": [
|
| 90 |
+
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
|
| 91 |
+
{"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
|
| 92 |
+
]
|
| 93 |
+
}'
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
## Common Issues
|
| 97 |
+
|
| 98 |
+
### MiniMax-M2 model is not currently supported
|
| 99 |
+
|
| 100 |
+
Please upgrade to the latest stable version, >= v0.5.4.post1.
|
| 101 |
+
|
| 102 |
+
## Getting Support
|
| 103 |
+
|
| 104 |
+
If you encounter any issues while deploying the MiniMax model:
|
| 105 |
+
|
| 106 |
+
- Contact our technical support team through official channels such as email at [model@minimax.io](mailto:model@minimax.io)
|
| 107 |
+
|
| 108 |
+
- Submit an issue on our [GitHub](https://github.com/MiniMax-AI) repository
|
| 109 |
+
|
| 110 |
+
We continuously optimize the deployment experience for our models. Feedback is welcome!
|
| 111 |
+
|
docs/sglang_deploy_guide_cn.md
ADDED
|
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MiniMax M2.5 模型 SGLang 部署指南
|
| 2 |
+
|
| 3 |
+
[英文版](./sglang_deploy_guide.md) | [中文版](./sglang_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
+
我们推荐使用 [SGLang](https://github.com/sgl-project/sglang) 来部署 [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) 模型。SGLang 是一个高性能的推理引擎,其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 SGLang 的官方文档以检查硬件兼容性。
|
| 6 |
+
|
| 7 |
+
## 本文档适用模型
|
| 8 |
+
|
| 9 |
+
本文档适用以下模型,只需在部署时修改模型名称即可。
|
| 10 |
+
|
| 11 |
+
- [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5)
|
| 12 |
+
- [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1)
|
| 13 |
+
- [MiniMaxAI/MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)
|
| 14 |
+
|
| 15 |
+
以下以 MiniMax-M2.5 为例说明部署流程。
|
| 16 |
+
|
| 17 |
+
## 环境要求
|
| 18 |
+
|
| 19 |
+
- OS:Linux
|
| 20 |
+
|
| 21 |
+
- Python:3.9 - 3.12
|
| 22 |
+
|
| 23 |
+
- GPU:
|
| 24 |
+
|
| 25 |
+
- compute capability 7.0 or higher
|
| 26 |
+
|
| 27 |
+
- 显存需求:权重需要 220 GB,每 1M 上下文 token 需要 240 GB
|
| 28 |
+
|
| 29 |
+
以下为推荐配置,实际需求请根据业务场景调整:
|
| 30 |
+
|
| 31 |
+
- **96G x4 GPU**:总 KV Cache 容量支持 40 万 token。
|
| 32 |
+
|
| 33 |
+
- **144G x8 GPU**:总 KV Cache 容量支持高达 300 万 token。
|
| 34 |
+
|
| 35 |
+
> **注**:以上数值为硬件支持的最大并发缓存总量,模型单序列(Single Sequence)长度上限仍为 196k。
|
| 36 |
+
|
| 37 |
+
## 使用 Python 部署
|
| 38 |
+
|
| 39 |
+
建议使用虚拟环境(如 **venv**、**conda**、**uv**)以避免依赖冲突。
|
| 40 |
+
|
| 41 |
+
建议在全新的 Python 环境中安装 SGLang:
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
uv venv
|
| 45 |
+
source .venv/bin/activate
|
| 46 |
+
uv pip install sglang
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
运行如下命令启动 SGLang 服务器,SGLang 会自动从 Huggingface 下载并缓存 MiniMax-M2.5 模型。
|
| 50 |
+
|
| 51 |
+
4 卡部署命令:
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
python -m sglang.launch_server \
|
| 55 |
+
--model-path MiniMaxAI/MiniMax-M2.5 \
|
| 56 |
+
--tp-size 4 \
|
| 57 |
+
--tool-call-parser minimax-m2 \
|
| 58 |
+
--reasoning-parser minimax-append-think \
|
| 59 |
+
--host 0.0.0.0 \
|
| 60 |
+
--trust-remote-code \
|
| 61 |
+
--port 8000 \
|
| 62 |
+
--mem-fraction-static 0.85
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
8 卡部署命令:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
python -m sglang.launch_server \
|
| 69 |
+
--model-path MiniMaxAI/MiniMax-M2.5 \
|
| 70 |
+
--tp-size 8 \
|
| 71 |
+
--ep-size 8 \
|
| 72 |
+
--tool-call-parser minimax-m2 \
|
| 73 |
+
--trust-remote-code \
|
| 74 |
+
--host 0.0.0.0 \
|
| 75 |
+
--reasoning-parser minimax-append-think \
|
| 76 |
+
--port 8000 \
|
| 77 |
+
--mem-fraction-static 0.85
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
## 测试部署
|
| 81 |
+
|
| 82 |
+
启动后,可以通过如下命令测试 SGLang OpenAI 兼容接口:
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
curl http://localhost:8000/v1/chat/completions \
|
| 86 |
+
-H "Content-Type: application/json" \
|
| 87 |
+
-d '{
|
| 88 |
+
"model": "MiniMaxAI/MiniMax-M2.5",
|
| 89 |
+
"messages": [
|
| 90 |
+
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
|
| 91 |
+
{"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
|
| 92 |
+
]
|
| 93 |
+
}'
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
## 常见问题
|
| 97 |
+
|
| 98 |
+
### Huggingface 网络问题
|
| 99 |
+
|
| 100 |
+
如果遇到网络问题,可以设置代理后再进行拉取。
|
| 101 |
+
|
| 102 |
+
```bash
|
| 103 |
+
export HF_ENDPOINT=https://hf-mirror.com
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
### MiniMax-M2 model is not currently supported
|
| 107 |
+
|
| 108 |
+
请升级到最新的稳定版本, >= v0.5.4.post1.
|
| 109 |
+
|
| 110 |
+
## 获取支持
|
| 111 |
+
|
| 112 |
+
如果在部署 MiniMax 模型过程中遇到任何问题:
|
| 113 |
+
|
| 114 |
+
- 通过邮箱 [model@minimax.io](mailto:model@minimax.io) 等官方渠道联系我们的技术支持团队
|
| 115 |
+
|
| 116 |
+
- 在我们的 [GitHub](https://github.com/MiniMax-AI) 仓库提交 Issue
|
| 117 |
+
|
| 118 |
+
- 通过我们的 [官方企业微信交流群](https://github.com/MiniMax-AI/MiniMax-AI.github.io/blob/main/images/wechat-qrcode.jpeg) 反馈
|
| 119 |
+
|
| 120 |
+
我们会持续优化模型的部署体验,欢迎反馈!
|
docs/transformers_deploy_guide.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MiniMax M2.5 Model Transformers Deployment Guide
|
| 2 |
+
|
| 3 |
+
[English Version](./transformers_deploy_guide.md) | [Chinese Version](./transformers_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
+
## Applicable Models
|
| 6 |
+
|
| 7 |
+
This document applies to the following models. You only need to change the model name during deployment.
|
| 8 |
+
|
| 9 |
+
- [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5)
|
| 10 |
+
- [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1)
|
| 11 |
+
- [MiniMaxAI/MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)
|
| 12 |
+
|
| 13 |
+
The deployment process is illustrated below using MiniMax-M2.5 as an example.
|
| 14 |
+
|
| 15 |
+
## System Requirements
|
| 16 |
+
|
| 17 |
+
- OS: Linux
|
| 18 |
+
|
| 19 |
+
- Python: 3.9 - 3.12
|
| 20 |
+
|
| 21 |
+
- Transformers: 4.57.1
|
| 22 |
+
|
| 23 |
+
- GPU:
|
| 24 |
+
|
| 25 |
+
- compute capability 7.0 or higher
|
| 26 |
+
|
| 27 |
+
- Memory requirements: 220 GB for weights.
|
| 28 |
+
|
| 29 |
+
## Deployment with Python
|
| 30 |
+
|
| 31 |
+
It is recommended to use a virtual environment (such as **venv**, **conda**, or **uv**) to avoid dependency conflicts.
|
| 32 |
+
|
| 33 |
+
We recommend installing Transformers in a fresh Python environment:
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
uv pip install transformers==4.57.1 torch accelerate --torch-backend=auto
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
Run the following Python script to run the model. Transformers will automatically download and cache the MiniMax-M2.5 model from Hugging Face.
|
| 40 |
+
|
| 41 |
+
```python
|
| 42 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
|
| 43 |
+
import torch
|
| 44 |
+
|
| 45 |
+
MODEL_PATH = "MiniMaxAI/MiniMax-M2.5"
|
| 46 |
+
|
| 47 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 48 |
+
MODEL_PATH,
|
| 49 |
+
device_map="auto",
|
| 50 |
+
trust_remote_code=True,
|
| 51 |
+
)
|
| 52 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
|
| 53 |
+
|
| 54 |
+
messages = [
|
| 55 |
+
{"role": "user", "content": [{"type": "text", "text": "What is your favourite condiment?"}]},
|
| 56 |
+
{"role": "assistant", "content": [{"type": "text", "text": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}]},
|
| 57 |
+
{"role": "user", "content": [{"type": "text", "text": "Do you have mayonnaise recipes?"}]}
|
| 58 |
+
]
|
| 59 |
+
|
| 60 |
+
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
|
| 61 |
+
|
| 62 |
+
generated_ids = model.generate(model_inputs, max_new_tokens=100, generation_config=model.generation_config)
|
| 63 |
+
|
| 64 |
+
response = tokenizer.batch_decode(generated_ids)[0]
|
| 65 |
+
|
| 66 |
+
print(response)
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## Common Issues
|
| 70 |
+
|
| 71 |
+
### Hugging Face Network Issues
|
| 72 |
+
|
| 73 |
+
If you encounter network issues, you can set up a proxy before pulling the model.
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
export HF_ENDPOINT=https://hf-mirror.com
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
### MiniMax-M2 model is not currently supported
|
| 80 |
+
|
| 81 |
+
Please check that trust_remote_code=True.
|
| 82 |
+
|
| 83 |
+
## Getting Support
|
| 84 |
+
|
| 85 |
+
If you encounter any issues while deploying the MiniMax model:
|
| 86 |
+
|
| 87 |
+
- Contact our technical support team through official channels such as email at [model@minimax.io](mailto:model@minimax.io)
|
| 88 |
+
|
| 89 |
+
- Submit an issue on our [GitHub](https://github.com/MiniMax-AI) repository
|
| 90 |
+
|
| 91 |
+
We continuously optimize the deployment experience for our models. Feedback is welcome!
|
| 92 |
+
|
docs/transformers_deploy_guide_cn.md
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MiniMax M2.5 模型 Transformers 部署指南
|
| 2 |
+
|
| 3 |
+
[英文版](./transformers_deploy_guide.md) | [中文版](./transformers_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
+
## 本文档适用模型
|
| 6 |
+
|
| 7 |
+
本文档适用以下模型,只需在部署时修改模型名称即可。
|
| 8 |
+
|
| 9 |
+
- [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5)
|
| 10 |
+
- [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1)
|
| 11 |
+
- [MiniMaxAI/MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)
|
| 12 |
+
|
| 13 |
+
以下以 MiniMax-M2.5 为例说明部署流程。
|
| 14 |
+
|
| 15 |
+
## 环境要求
|
| 16 |
+
|
| 17 |
+
- OS:Linux
|
| 18 |
+
|
| 19 |
+
- Python:3.9 - 3.12
|
| 20 |
+
|
| 21 |
+
- Transformers: 4.57.1
|
| 22 |
+
|
| 23 |
+
- GPU:
|
| 24 |
+
|
| 25 |
+
- compute capability 7.0 or higher
|
| 26 |
+
|
| 27 |
+
- 显存需求:权重需要 220 GB
|
| 28 |
+
|
| 29 |
+
## 使用 Python 部署
|
| 30 |
+
|
| 31 |
+
建议使用虚拟环境(如 **venv**、**conda**、**uv**)以避免依赖冲突。
|
| 32 |
+
|
| 33 |
+
建议在全新的 Python 环境中安装 Transformers:
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
uv pip install transformers==4.57.1 torch accelerate --torch-backend=auto
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
运行如下 Python 命令运行模型,Transformers 会自动从 Huggingface 下载并缓存 MiniMax-M2.5 模型。
|
| 40 |
+
|
| 41 |
+
```python
|
| 42 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
|
| 43 |
+
import torch
|
| 44 |
+
|
| 45 |
+
MODEL_PATH = "MiniMaxAI/MiniMax-M2.5"
|
| 46 |
+
|
| 47 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 48 |
+
MODEL_PATH,
|
| 49 |
+
device_map="auto",
|
| 50 |
+
trust_remote_code=True,
|
| 51 |
+
)
|
| 52 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
|
| 53 |
+
|
| 54 |
+
messages = [
|
| 55 |
+
{"role": "user", "content": [{"type": "text", "text": "What is your favourite condiment?"}]},
|
| 56 |
+
{"role": "assistant", "content": [{"type": "text", "text": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}]},
|
| 57 |
+
{"role": "user", "content": [{"type": "text", "text": "Do you have mayonnaise recipes?"}]}
|
| 58 |
+
]
|
| 59 |
+
|
| 60 |
+
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
|
| 61 |
+
|
| 62 |
+
generated_ids = model.generate(model_inputs, max_new_tokens=100, generation_config=model.generation_config)
|
| 63 |
+
|
| 64 |
+
response = tokenizer.batch_decode(generated_ids)[0]
|
| 65 |
+
|
| 66 |
+
print(response)
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## 常见问题
|
| 70 |
+
|
| 71 |
+
### Huggingface 网络问题
|
| 72 |
+
|
| 73 |
+
如果遇到网络问题,可以设置代理后再进行拉取。
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
export HF_ENDPOINT=https://hf-mirror.com
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
### MiniMax-M2 model is not currently supported
|
| 80 |
+
|
| 81 |
+
请确认开启 trust_remote_code=True。
|
| 82 |
+
|
| 83 |
+
## 获取支持
|
| 84 |
+
|
| 85 |
+
如果在部署 MiniMax 模型过程中遇到任何问题:
|
| 86 |
+
|
| 87 |
+
- 通过邮箱 [model@minimax.io](mailto:model@minimax.io) 等官方渠道联系我们的技术支持团队
|
| 88 |
+
|
| 89 |
+
- 在我们的 [GitHub](https://github.com/MiniMax-AI) 仓库提交 Issue
|
| 90 |
+
|
| 91 |
+
- 通过我们的 [官方企业微信交流群](https://github.com/MiniMax-AI/MiniMax-AI.github.io/blob/main/images/wechat-qrcode.jpeg) 反馈
|
| 92 |
+
|
| 93 |
+
我们会持续优化模型的部署体验,欢迎反馈!
|
docs/vllm_deploy_guide.md
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MiniMax M2.5 Model vLLM Deployment Guide
|
| 2 |
+
|
| 3 |
+
[English Version](./vllm_deploy_guide.md) | [Chinese Version](./vllm_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
+
We recommend using [vLLM](https://docs.vllm.ai/en/stable/) to deploy the [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) model. vLLM is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing vLLM's official documentation to check hardware compatibility before deployment.
|
| 6 |
+
|
| 7 |
+
## Applicable Models
|
| 8 |
+
|
| 9 |
+
This document applies to the following models. You only need to change the model name during deployment.
|
| 10 |
+
|
| 11 |
+
- [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5)
|
| 12 |
+
- [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1)
|
| 13 |
+
- [MiniMaxAI/MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)
|
| 14 |
+
|
| 15 |
+
The deployment process is illustrated below using MiniMax-M2.5 as an example.
|
| 16 |
+
|
| 17 |
+
## System Requirements
|
| 18 |
+
|
| 19 |
+
- OS: Linux
|
| 20 |
+
|
| 21 |
+
- Python: 3.9 - 3.12
|
| 22 |
+
|
| 23 |
+
- GPU:
|
| 24 |
+
|
| 25 |
+
- compute capability 7.0 or higher
|
| 26 |
+
|
| 27 |
+
- Memory requirements: 220 GB for weights, 240 GB per 1M context tokens
|
| 28 |
+
|
| 29 |
+
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 30 |
+
|
| 31 |
+
- **96G x4** GPU: Supports a total KV Cache capacity of 400K tokens.
|
| 32 |
+
|
| 33 |
+
- **144G x8** GPU: Supports a total KV Cache capacity of up to 3M tokens.
|
| 34 |
+
|
| 35 |
+
> **Note**: The values above represent the total aggregate hardware KV Cache capacity. The maximum context length per individual sequence remains **196K** tokens.
|
| 36 |
+
|
| 37 |
+
## Deployment with Python
|
| 38 |
+
|
| 39 |
+
It is recommended to use a virtual environment (such as **venv**, **conda**, or **uv**) to avoid dependency conflicts.
|
| 40 |
+
|
| 41 |
+
We recommend installing vLLM in a fresh Python environment:
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
uv venv
|
| 45 |
+
source .venv/bin/activate
|
| 46 |
+
uv pip install vllm --torch-backend=auto
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
Run the following command to start the vLLM server. vLLM will automatically download and cache the MiniMax-M2.5 model from Hugging Face.
|
| 50 |
+
|
| 51 |
+
4-GPU deployment command:
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
SAFETENSORS_FAST_GPU=1 vllm serve \
|
| 55 |
+
MiniMaxAI/MiniMax-M2.5 --trust-remote-code \
|
| 56 |
+
--tensor-parallel-size 4 \
|
| 57 |
+
--enable-auto-tool-choice --tool-call-parser minimax_m2 \
|
| 58 |
+
--reasoning-parser minimax_m2_append_think
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
8-GPU deployment command:
|
| 62 |
+
|
| 63 |
+
```bash
|
| 64 |
+
SAFETENSORS_FAST_GPU=1 vllm serve \
|
| 65 |
+
MiniMaxAI/MiniMax-M2.5 --trust-remote-code \
|
| 66 |
+
--enable_expert_parallel --tensor-parallel-size 8 \
|
| 67 |
+
--enable-auto-tool-choice --tool-call-parser minimax_m2 \
|
| 68 |
+
--reasoning-parser minimax_m2_append_think
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Testing Deployment
|
| 72 |
+
|
| 73 |
+
After startup, you can test the vLLM OpenAI-compatible API with the following command:
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
curl http://localhost:8000/v1/chat/completions \
|
| 77 |
+
-H "Content-Type: application/json" \
|
| 78 |
+
-d '{
|
| 79 |
+
"model": "MiniMaxAI/MiniMax-M2.5",
|
| 80 |
+
"messages": [
|
| 81 |
+
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
|
| 82 |
+
{"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
|
| 83 |
+
]
|
| 84 |
+
}'
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
## Common Issues
|
| 88 |
+
|
| 89 |
+
### MiniMax-M2 model is not currently supported
|
| 90 |
+
|
| 91 |
+
This vLLM version is outdated. Please upgrade to the latest version.
|
| 92 |
+
|
| 93 |
+
### torch.AcceleratorError: CUDA error: an illegal memory access was encountered
|
| 94 |
+
Add `--compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"` to the startup parameters to resolve this issue. For example:
|
| 95 |
+
|
| 96 |
+
```bash
|
| 97 |
+
SAFETENSORS_FAST_GPU=1 vllm serve \
|
| 98 |
+
MiniMaxAI/MiniMax-M2.5 --trust-remote-code \
|
| 99 |
+
--enable_expert_parallel --tensor-parallel-size 8 \
|
| 100 |
+
--enable-auto-tool-choice --tool-call-parser minimax_m2 \
|
| 101 |
+
--reasoning-parser minimax_m2_append_think \
|
| 102 |
+
--compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
### Output is garbled
|
| 106 |
+
|
| 107 |
+
If you encounter corrupted output when using vLLM to serve these models, you can upgrade to the nightly version (ensure it is a version after commit [cf3eacfe58fa9e745c2854782ada884a9f992cf7](https://github.com/vllm-project/vllm/commit/cf3eacfe58fa9e745c2854782ada884a9f992cf7))
|
| 108 |
+
|
| 109 |
+
## Getting Support
|
| 110 |
+
|
| 111 |
+
If you encounter any issues while deploying the MiniMax model:
|
| 112 |
+
|
| 113 |
+
- Contact our technical support team through official channels such as email at [model@minimax.io](mailto:model@minimax.io)
|
| 114 |
+
|
| 115 |
+
- Submit an issue on our [GitHub](https://github.com/MiniMax-AI) repository
|
| 116 |
+
|
| 117 |
+
We continuously optimize the deployment experience for our models. Feedback is welcome!
|
docs/vllm_deploy_guide_cn.md
ADDED
|
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MiniMax M2.5 模型 vLLM 部署指南
|
| 2 |
+
|
| 3 |
+
[英文版](./vllm_deploy_guide.md) | [中文版](./vllm_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
+
我们推荐使用 [vLLM](https://docs.vllm.ai/en/stable/) 来部署 [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) 模型。vLLM 是一个高性能的推理引擎,其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 vLLM 的官方文档以检查硬件兼容性。
|
| 6 |
+
|
| 7 |
+
## 本文档适用模型
|
| 8 |
+
|
| 9 |
+
本文档适用以下模型,只需在部署时修改模型名称即可。
|
| 10 |
+
|
| 11 |
+
- [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5)
|
| 12 |
+
- [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1)
|
| 13 |
+
- [MiniMaxAI/MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)
|
| 14 |
+
|
| 15 |
+
以下以 MiniMax-M2.5 为例说明部署流程。
|
| 16 |
+
|
| 17 |
+
## 环境要求
|
| 18 |
+
|
| 19 |
+
- OS:Linux
|
| 20 |
+
|
| 21 |
+
- Python:3.9 - 3.12
|
| 22 |
+
|
| 23 |
+
- GPU:
|
| 24 |
+
|
| 25 |
+
- compute capability 7.0 or higher
|
| 26 |
+
|
| 27 |
+
- 显存需求:权重需要 220 GB,每 1M 上下文 token 需要 240 GB
|
| 28 |
+
|
| 29 |
+
以下为推荐配置,实际需求请根据业务场景调整:
|
| 30 |
+
|
| 31 |
+
- **96G x4 GPU**:总 KV Cache 容量支持 40 万 token。
|
| 32 |
+
|
| 33 |
+
- **144G x8 GPU**:总 KV Cache 容量支持高达 300 万 token。
|
| 34 |
+
|
| 35 |
+
> **注**:以上数值为硬件支持的最大并发缓存总量,模型单序列(Single Sequence)长度上限仍为 196k。
|
| 36 |
+
|
| 37 |
+
## 使用 Python 部署
|
| 38 |
+
|
| 39 |
+
建议使用虚拟环境(如 **venv**、**conda**、**uv**)以避免依赖冲突。
|
| 40 |
+
|
| 41 |
+
建议在全新的 Python 环境中安装 vLLM:
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
uv venv
|
| 45 |
+
source .venv/bin/activate
|
| 46 |
+
uv pip install vllm --torch-backend=auto
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
运行如下命令启动 vLLM 服务器,vLLM 会自动从 Huggingface 下载并缓存 MiniMax-M2.5 模型。
|
| 50 |
+
|
| 51 |
+
4 卡部署命令:
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
SAFETENSORS_FAST_GPU=1 vllm serve \
|
| 55 |
+
MiniMaxAI/MiniMax-M2.5 --trust-remote-code \
|
| 56 |
+
--tensor-parallel-size 4 \
|
| 57 |
+
--enable-auto-tool-choice --tool-call-parser minimax_m2 \
|
| 58 |
+
--reasoning-parser minimax_m2_append_think
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
8 卡部署命令:
|
| 62 |
+
|
| 63 |
+
```bash
|
| 64 |
+
SAFETENSORS_FAST_GPU=1 vllm serve \
|
| 65 |
+
MiniMaxAI/MiniMax-M2.5 --trust-remote-code \
|
| 66 |
+
--enable_expert_parallel --tensor-parallel-size 8 \
|
| 67 |
+
--enable-auto-tool-choice --tool-call-parser minimax_m2 \
|
| 68 |
+
--reasoning-parser minimax_m2_append_think
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## 测试部署
|
| 72 |
+
|
| 73 |
+
启动后,可以通过如下命令测试 vLLM OpenAI 兼容接口:
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
curl http://localhost:8000/v1/chat/completions \
|
| 77 |
+
-H "Content-Type: application/json" \
|
| 78 |
+
-d '{
|
| 79 |
+
"model": "MiniMaxAI/MiniMax-M2.5",
|
| 80 |
+
"messages": [
|
| 81 |
+
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
|
| 82 |
+
{"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
|
| 83 |
+
]
|
| 84 |
+
}'
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
## 常见问题
|
| 88 |
+
|
| 89 |
+
### Huggingface 网络问题
|
| 90 |
+
|
| 91 |
+
如果遇到网络问题,可以设置代理后再进行拉取。
|
| 92 |
+
|
| 93 |
+
```bash
|
| 94 |
+
export HF_ENDPOINT=https://hf-mirror.com
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### MiniMax-M2 model is not currently supported
|
| 98 |
+
|
| 99 |
+
该 vLLM 版本过旧,请升级到最新版本。
|
| 100 |
+
|
| 101 |
+
### torch.AcceleratorError: CUDA error: an illegal memory access was encountered
|
| 102 |
+
在启动参数添加 `--compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"` 可以解决。例如:
|
| 103 |
+
|
| 104 |
+
```bash
|
| 105 |
+
SAFETENSORS_FAST_GPU=1 vllm serve \
|
| 106 |
+
MiniMaxAI/MiniMax-M2.5 --trust-remote-code \
|
| 107 |
+
--enable_expert_parallel --tensor-parallel-size 8 \
|
| 108 |
+
--enable-auto-tool-choice --tool-call-parser minimax_m2 \
|
| 109 |
+
--reasoning-parser minimax_m2_append_think \
|
| 110 |
+
--compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
### 模型输出乱码
|
| 114 |
+
|
| 115 |
+
如果您在使用 vLLM 运行这些模型时遇到输出乱码,可以升级到最新版本(请至少确保版本在提交 [cf3eacfe58fa9e745c2854782ada884a9f992cf7](https://github.com/vllm-project/vllm/commit/cf3eacfe58fa9e745c2854782ada884a9f992cf7) 之后)。
|
| 116 |
+
|
| 117 |
+
## 获取支持
|
| 118 |
+
|
| 119 |
+
如果在部署 MiniMax 模型过程中遇到任何问题:
|
| 120 |
+
|
| 121 |
+
- 通过邮箱 [model@minimax.io](mailto:model@minimax.io) 等官方渠道联系我们的技术支持团队
|
| 122 |
+
|
| 123 |
+
- 在我们的 [GitHub](https://github.com/MiniMax-AI) 仓库提交 Issue
|
| 124 |
+
|
| 125 |
+
- 通过我们的 [官方企业微信交流群](https://github.com/MiniMax-AI/MiniMax-AI.github.io/blob/main/images/wechat-qrcode.jpeg) 反馈
|
| 126 |
+
|
| 127 |
+
我们会持续优化模型的部署体验,欢迎反馈!
|