Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
1
1
Aalisha Dalal
aalisha
Follow
DShah-11's profile picture
1 follower
ยท
6 following
AI & ML interests
* Computer Vision * Deep Learning
Recent Activity
updated
a dataset
about 1 month ago
aalisha/srmdtranslations
published
a dataset
about 1 month ago
aalisha/srmdtranslations
reacted
to
reach-vb
's
post
with ๐
over 1 year ago
Smol TTS models are here! OuteTTS-0.1-350M - Zero shot voice cloning, built on LLaMa architecture, CC-BY license! ๐ฅ > Pure language modeling approach to TTS > Zero-shot voice cloning > LLaMa architecture w/ Audio tokens (WavTokenizer) > BONUS: Works on-device w/ llama.cpp โก Three-step approach to TTS: > Audio tokenization using WavTokenizer (75 tok per second) > CTC forced alignment for word-to-audio token mapping > Structured prompt creation w/ transcription, duration, audio tokens The model is extremely impressive for 350M parameters! Kudos to the OuteAI team on such a brilliant feat - I'd love to see this be applied on larger data and smarter backbones like SmolLM ๐ค Check out the models here: https://huggingface.co/collections/OuteAI/outetts-6728aa71a53a076e4ba4817c
View all activity
Organizations
models
0
None public yet
datasets
1
aalisha/srmdtranslations
Updated
May 9
โข
174