Update README.md
Browse files
README.md
CHANGED
|
@@ -118,7 +118,7 @@ Voxtral Mini 4B Realtime is competitive to leading offline models and shows sign
|
|
| 118 |
|
| 119 |
The model can also be deployed with the following libraries:
|
| 120 |
- [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm-recommended)
|
| 121 |
-
- [`transformers
|
| 122 |
- *Community Contributions*: See [here](#community-contributions-untested)
|
| 123 |
|
| 124 |
### vLLM (recommended)
|
|
@@ -214,20 +214,25 @@ Make sure to have `mistral-common` installed with audio dependencies:
|
|
| 214 |
pip install --upgrade "mistral-common[audio]"
|
| 215 |
```
|
| 216 |
|
|
|
|
|
|
|
| 217 |
```python
|
| 218 |
-
import torch
|
| 219 |
from transformers import VoxtralRealtimeForConditionalGeneration, AutoProcessor
|
| 220 |
-
from
|
|
|
|
| 221 |
|
| 222 |
repo_id = "mistralai/Voxtral-Mini-4B-Realtime-2602"
|
| 223 |
|
| 224 |
processor = AutoProcessor.from_pretrained(repo_id)
|
| 225 |
model = VoxtralRealtimeForConditionalGeneration.from_pretrained(repo_id, device_map="auto")
|
| 226 |
|
| 227 |
-
|
| 228 |
-
|
|
|
|
|
|
|
|
|
|
| 229 |
|
| 230 |
-
inputs = processor(audio, return_tensors="pt")
|
| 231 |
inputs = inputs.to(model.device, dtype=model.dtype)
|
| 232 |
|
| 233 |
outputs = model.generate(**inputs)
|
|
|
|
| 118 |
|
| 119 |
The model can also be deployed with the following libraries:
|
| 120 |
- [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm-recommended)
|
| 121 |
+
- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
|
| 122 |
- *Community Contributions*: See [here](#community-contributions-untested)
|
| 123 |
|
| 124 |
### vLLM (recommended)
|
|
|
|
| 214 |
pip install --upgrade "mistral-common[audio]"
|
| 215 |
```
|
| 216 |
|
| 217 |
+
#### Usage
|
| 218 |
+
|
| 219 |
```python
|
|
|
|
| 220 |
from transformers import VoxtralRealtimeForConditionalGeneration, AutoProcessor
|
| 221 |
+
from mistral_common.tokens.tokenizers.audio import Audio
|
| 222 |
+
from huggingface_hub import hf_hub_download
|
| 223 |
|
| 224 |
repo_id = "mistralai/Voxtral-Mini-4B-Realtime-2602"
|
| 225 |
|
| 226 |
processor = AutoProcessor.from_pretrained(repo_id)
|
| 227 |
model = VoxtralRealtimeForConditionalGeneration.from_pretrained(repo_id, device_map="auto")
|
| 228 |
|
| 229 |
+
repo_id = "patrickvonplaten/audio_samples"
|
| 230 |
+
audio_file = hf_hub_download(repo_id=repo_id, filename="bcn_weather.mp3", repo_type="dataset")
|
| 231 |
+
|
| 232 |
+
audio = Audio.from_file(audio_file, strict=False)
|
| 233 |
+
audio.resample(processor.feature_extractor.sampling_rate)
|
| 234 |
|
| 235 |
+
inputs = processor(audio.audio_array, return_tensors="pt")
|
| 236 |
inputs = inputs.to(model.device, dtype=model.dtype)
|
| 237 |
|
| 238 |
outputs = model.generate(**inputs)
|