tsinghua-ee
/

video_SALMONN2plus_3B_audioAlign

Model card Files Files and versions

Audio aligned model of video-SALMONN 2+ 3B model

Downloads last month: 10

Safetensors

Model size

5B params

Tensor type

I64

·

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tsinghua-ee/video_SALMONN2plus_3B_audioAlign

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

(688)

this model

Collection including tsinghua-ee/video_SALMONN2plus_3B_audioAlign

video-SALMONN 2

video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. • 11 items • Updated 7 days ago • 1