Speech Recognition Models
Collection
1 item
•
Updated
This repository contains a LoRA (Low-Rank Adaptation) adapter for the VibeVoice-ASR model. This fine-tuned version was trained on approximately 500 hours of speech data to enhance its accuracy.
Using 1000 samples from CommonVoice 17 as the evaluation dataset, the following metrics demonstrate a significant improvement over the base model:
| Metric | Base Model (without LoRA) | This Model (with LoRA) |
|---|---|---|
| Raw WER | 53.02% | 19.25% |
| Normalized WER | 48.67% | 15.90% |
For inference please refer to the official Microsoft repo: https://github.com/microsoft/VibeVoice
Due to the specific licensing and characteristics of the dataset used during the fine-tuning process, this model is prohibited for commercial use. It is intended only for research and evaluation.
Base model
microsoft/VibeVoice-ASR