VibeVoice-ASR_LoRA_Hungarian_v1

This repository contains a LoRA (Low-Rank Adaptation) adapter for the VibeVoice-ASR model. This fine-tuned version was trained on approximately 500 hours of speech data to enhance its accuracy.

Performance Comparison

Using 1000 samples from CommonVoice 17 as the evaluation dataset, the following metrics demonstrate a significant improvement over the base model:

Metric Base Model (without LoRA) This Model (with LoRA)
Raw WER 53.02% 19.25%
Normalized WER 48.67% 15.90%

Inference

For inference please refer to the official Microsoft repo: https://github.com/microsoft/VibeVoice

Non-Commercial Use Only

Due to the specific licensing and characteristics of the dataset used during the fine-tuning process, this model is prohibited for commercial use. It is intended only for research and evaluation.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Cseti/VibeVoice-ASR_LoRA_Hungarian_v1

Adapter
(4)
this model

Collection including Cseti/VibeVoice-ASR_LoRA_Hungarian_v1

Evaluation results