Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
EurekaTian
/
ROMA
like
1
Video-Text-to-Text
Transformers
Safetensors
qwen2_5_omni
multimodal
video-understanding
audio-understanding
streaming
real-time
omni-modal
arxiv:
2601.10323
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
ROMA
Commit History
Update README.md
cc245ea
verified
EurekaTian
commited on
Jan 19
Update README.md
9756644
verified
EurekaTian
commited on
Jan 16
Update README.md
6cdb469
verified
EurekaTian
commited on
Jan 15
Update README.md
e5dff69
verified
EurekaTian
commited on
Jan 15
Upload architecture.png
6cbc0e8
verified
EurekaTian
commited on
Jan 15
Update README.md
936754e
verified
EurekaTian
commited on
Jan 15
Upload folder using huggingface_hub
789f831
verified
EurekaTian
commited on
Jan 15
initial commit
bada6e1
verified
EurekaTian
commited on
Jan 15