video-SALMONN 2
Collection
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. • 11 items • Updated
• 1
Audio aligned model of video-SALMONN 2+ 3B model
Base model
Qwen/Qwen2.5-VL-3B-Instruct