Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought!
YM Qin
Wakals
AI & ML interests
Computer Vision, Vision-language Model, Generative Model
Recent Activity
upvoted
a
collection
about 14 hours ago
MMFineReason
upvoted
a
paper
7 days ago
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
liked
a dataset
22 days ago
DietCoke4671/ToolVQA
Organizations
None yet