VibeVoice-ASR_LoRA_Hungarian_v1

This repository contains a LoRA (Low-Rank Adaptation) adapter for the VibeVoice-ASR model. This fine-tuned version was trained on approximately 500 hours of speech data to enhance its accuracy.

Performance Comparison

Using 1000 samples from CommonVoice 17 as the evaluation dataset, the following metrics demonstrate a significant improvement over the base model:

Metric	Base Model (without LoRA)	This Model (with LoRA)
Raw WER	53.02%	19.25%
Normalized WER	48.67%	15.90%

Inference

For inference please refer to the official Microsoft repo: https://github.com/microsoft/VibeVoice

Non-Commercial Use Only

Due to the specific licensing and characteristics of the dataset used during the fine-tuning process, this model is prohibited for commercial use. It is intended only for research and evaluation.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Cseti/VibeVoice-ASR_LoRA_Hungarian_v1

Base model

microsoft/VibeVoice-ASR

Adapter

(4)

this model

Collection including Cseti/VibeVoice-ASR_LoRA_Hungarian_v1

Speech Recognition Models

Collection

1 item • Updated 3 days ago

Evaluation results

Raw WER (with LoRA)
self-reported

19.250
Normalized WER (with LoRA)
self-reported

15.900