DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech
Paper โข 2509.09631 โข Published
DiFlow-TTS is trained on 470 hours of the LibriTTS dataset, which consists of predominantly neutral speech. As a result, it may not perform well on prompts with strong emotional expression.
Download DiFlow-TTS checkpoint, and place it as follows:
root/
โโโ ckpts/
โโโ diflow-tts.ckpt