PaLI-X: On Scaling up a Multilingual Vision and Language Model Paper • 2305.18565 • Published May 29, 2023 • 4
Improve Supervised Representation Learning with Masked Image Modeling Paper • 2312.00950 • Published Dec 1, 2023
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 51
Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations Paper • 2309.01858 • Published Sep 4, 2023
CoCa: Contrastive Captioners are Image-Text Foundation Models Paper • 2205.01917 • Published May 4, 2022 • 3
PaLI: A Jointly-Scaled Multilingual Language-Image Model Paper • 2209.06794 • Published Sep 14, 2022 • 2
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published Mar 10, 2025 • 48
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7, 2025 • 67
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published Sep 24, 2025 • 50
TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment Paper • 2604.12012 • Published Apr 13 • 13
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini Paper • 2605.27295 • Published 7 days ago • 22