Describe what you see in webcam or video with AI
Generate speech from text using reference audio
Launch a web interface for text-to-speech and SSML processing
A Step Towards Music Generation Foundation Model