patrickvonplaten commited on
Commit
b3de5c8
·
verified ·
1 Parent(s): f74a921

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -118,7 +118,7 @@ Voxtral Mini 4B Realtime is competitive to leading offline models and shows sign
118
 
119
  The model can also be deployed with the following libraries:
120
  - [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm-recommended)
121
- - [`transformers (WIP)`](https://github.com/huggingface/transformers): See [here](#transformers)
122
  - *Community Contributions*: See [here](#community-contributions-untested)
123
 
124
  ### vLLM (recommended)
@@ -214,20 +214,25 @@ Make sure to have `mistral-common` installed with audio dependencies:
214
  pip install --upgrade "mistral-common[audio]"
215
  ```
216
 
 
 
217
  ```python
218
- import torch
219
  from transformers import VoxtralRealtimeForConditionalGeneration, AutoProcessor
220
- from datasets import load_dataset
 
221
 
222
  repo_id = "mistralai/Voxtral-Mini-4B-Realtime-2602"
223
 
224
  processor = AutoProcessor.from_pretrained(repo_id)
225
  model = VoxtralRealtimeForConditionalGeneration.from_pretrained(repo_id, device_map="auto")
226
 
227
- ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
228
- audio = ds[0]["audio"]["array"]
 
 
 
229
 
230
- inputs = processor(audio, return_tensors="pt")
231
  inputs = inputs.to(model.device, dtype=model.dtype)
232
 
233
  outputs = model.generate(**inputs)
 
118
 
119
  The model can also be deployed with the following libraries:
120
  - [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm-recommended)
121
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
122
  - *Community Contributions*: See [here](#community-contributions-untested)
123
 
124
  ### vLLM (recommended)
 
214
  pip install --upgrade "mistral-common[audio]"
215
  ```
216
 
217
+ #### Usage
218
+
219
  ```python
 
220
  from transformers import VoxtralRealtimeForConditionalGeneration, AutoProcessor
221
+ from mistral_common.tokens.tokenizers.audio import Audio
222
+ from huggingface_hub import hf_hub_download
223
 
224
  repo_id = "mistralai/Voxtral-Mini-4B-Realtime-2602"
225
 
226
  processor = AutoProcessor.from_pretrained(repo_id)
227
  model = VoxtralRealtimeForConditionalGeneration.from_pretrained(repo_id, device_map="auto")
228
 
229
+ repo_id = "patrickvonplaten/audio_samples"
230
+ audio_file = hf_hub_download(repo_id=repo_id, filename="bcn_weather.mp3", repo_type="dataset")
231
+
232
+ audio = Audio.from_file(audio_file, strict=False)
233
+ audio.resample(processor.feature_extractor.sampling_rate)
234
 
235
+ inputs = processor(audio.audio_array, return_tensors="pt")
236
  inputs = inputs.to(model.device, dtype=model.dtype)
237
 
238
  outputs = model.generate(**inputs)