DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Discrete Flow Matching

DiFlow-TTS is trained on 470 hours of the LibriTTS dataset, which consists of predominantly neutral speech. As a result, it may not perform well on prompts with strong emotional expression.

Download DiFlow-TTS checkpoint, and place it as follows:

root/
└── ckpts/
    └── diflow-tts.ckpt

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for Fsoft-AIC/DiFlowTTS

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech

Paper • 2509.09631 • Published Sep 11, 2025