OpenMOSS-Team/MOSS-TTS
Text-to-Speech • 8B • Updated • 43.3k • 394
A unified multimodal understanding and generation model.
Audio Conditioned LipSync with Latent Diffusion Models
Generate new person images with swapped clothes or poses
Upgraded to v1.0!
Quickly edit the expression of a face