Aalisha Dalal's picture

Aalisha Dalal

aalisha

·

AI & ML interests

* Computer Vision * Deep Learning

Recent Activity

updated a dataset about 1 month ago

aalisha/srmdtranslations

published a dataset about 1 month ago

aalisha/srmdtranslations

reacted to reach-vb's post with 👀 over 1 year ago

Smol TTS models are here! OuteTTS-0.1-350M - Zero shot voice cloning, built on LLaMa architecture, CC-BY license! 🔥 > Pure language modeling approach to TTS > Zero-shot voice cloning > LLaMa architecture w/ Audio tokens (WavTokenizer) > BONUS: Works on-device w/ llama.cpp ⚡ Three-step approach to TTS: > Audio tokenization using WavTokenizer (75 tok per second) > CTC forced alignment for word-to-audio token mapping > Structured prompt creation w/ transcription, duration, audio tokens The model is extremely impressive for 350M parameters! Kudos to the OuteAI team on such a brilliant feat - I'd love to see this be applied on larger data and smarter backbones like SmolLM 🤗 Check out the models here: https://huggingface.co/collections/OuteAI/outetts-6728aa71a53a076e4ba4817c

View all activity

Organizations

models 0

None public yet

datasets 1

aalisha/srmdtranslations

Updated May 9 • 174