Simple Self-Distillation
AI & ML interests
None defined yet.
Recent Activity
Papers
TopoPrimer: The Missing Topological Context in Forecasting Models
Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
Team members 743 private
Efficient Vision Encoding for Vision Language Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper β’ 2412.13303 β’ Published β’ 77 -
FastVLM WebGPU
π446Real-time video captioning powered by FastVLM
-
apple/FastVLM-0.5B
Text Generation β’ 0.8B β’ Updated β’ 5.37k β’ 394 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 2.44k β’ 80
-
apple/coreml-depth-anything-v2-small
Depth Estimation β’ Updated β’ 815 β’ 100 -
apple/coreml-depth-anything-small
Depth Estimation β’ Updated β’ 43 β’ 39 -
apple/coreml-detr-semantic-segmentation
Image Segmentation β’ Updated β’ 102 β’ 33 -
apple/coreml-FastViT-T8
Image Classification β’ Updated β’ 16 β’ 18
Benchmark for the design of efficient continual learning of image-text models over years.
-
TiC-CLIP: Continual Training of CLIP Models
Paper β’ 2310.16226 β’ Published β’ 10 -
apple/TiC-DataComp
Preview β’ Updated β’ 1.76k β’ 4 -
apple/TiC-CLIP-basic-cumulative
Zero-Shot Image Classification β’ Updated β’ 39 β’ 3 -
apple/TiC-CLIP-basic-oracle
Zero-Shot Image Classification β’ Updated β’ 3 β’ 1
-
apple/coreml-stable-diffusion-mixed-bit-palettization
Updated β’ 10 β’ 30 -
apple/coreml-stable-diffusion-xl-base
Text-to-Image β’ Updated β’ 44 β’ 70 -
apple/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 141 β’ 55 -
pcuenq/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 14 β’ 4
AIM: Autoregressive Image Models
CLaRa models
MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B
A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint.
-
apple/aimv2-large-patch14-224
Image Feature Extraction β’ 0.3B β’ Updated β’ 1.1k β’ 62 -
apple/aimv2-huge-patch14-224
Image Feature Extraction β’ 0.7B β’ Updated β’ 216 β’ 13 -
apple/aimv2-1B-patch14-224
Image Feature Extraction β’ 1B β’ Updated β’ 147 β’ 8 -
apple/aimv2-3B-patch14-224
Image Feature Extraction β’ 3B β’ Updated β’ 244 β’ 4
-
apple/OpenELM-270M-Instruct
Text Generation β’ 0.3B β’ Updated β’ 1.19k β’ 146 -
apple/OpenELM-450M-Instruct
Text Generation β’ 0.5B β’ Updated β’ 1.18k β’ 51 -
apple/OpenELM-1_1B-Instruct
Text Generation β’ 1B β’ Updated β’ 1.58M β’ 76 -
apple/OpenELM-3B-Instruct
Text Generation β’ 3B β’ Updated β’ 1.48k β’ 340
MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities.
DataCompDR: Improved datasets for training image-text SOTA models.
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper β’ 2311.17049 β’ Published β’ 8 -
apple/mobileclip_s0_timm
Image Classification β’ Updated β’ 269 β’ 12 -
apple/mobileclip_s1_timm
Image Classification β’ Updated β’ 38 β’ 3 -
apple/mobileclip_s2_timm
Image Classification β’ Updated β’ 109 β’ 6
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
-
apple/DepthPro-hf
Depth Estimation β’ 1.0B β’ Updated β’ 41.8k β’ 107 -
apple/DepthPro
Depth Estimation β’ Updated β’ 5.42k β’ 518 -
apple/DepthPro-mixin
Depth Estimation β’ 1.0B β’ Updated β’ 21 β’ 8 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification β’ 0.4B β’ Updated β’ 12.1M β’ 2.04k
CLIP Models trained using DFN-2B/DFN-5B datasets
DCLM Models + Datasets
Simple Self-Distillation
CLaRa models
Efficient Vision Encoding for Vision Language Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper β’ 2412.13303 β’ Published β’ 77 -
FastVLM WebGPU
π446Real-time video captioning powered by FastVLM
-
apple/FastVLM-0.5B
Text Generation β’ 0.8B β’ Updated β’ 5.37k β’ 394 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 2.44k β’ 80
MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B
A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint.
-
apple/aimv2-large-patch14-224
Image Feature Extraction β’ 0.3B β’ Updated β’ 1.1k β’ 62 -
apple/aimv2-huge-patch14-224
Image Feature Extraction β’ 0.7B β’ Updated β’ 216 β’ 13 -
apple/aimv2-1B-patch14-224
Image Feature Extraction β’ 1B β’ Updated β’ 147 β’ 8 -
apple/aimv2-3B-patch14-224
Image Feature Extraction β’ 3B β’ Updated β’ 244 β’ 4
-
apple/coreml-depth-anything-v2-small
Depth Estimation β’ Updated β’ 815 β’ 100 -
apple/coreml-depth-anything-small
Depth Estimation β’ Updated β’ 43 β’ 39 -
apple/coreml-detr-semantic-segmentation
Image Segmentation β’ Updated β’ 102 β’ 33 -
apple/coreml-FastViT-T8
Image Classification β’ Updated β’ 16 β’ 18
-
apple/OpenELM-270M-Instruct
Text Generation β’ 0.3B β’ Updated β’ 1.19k β’ 146 -
apple/OpenELM-450M-Instruct
Text Generation β’ 0.5B β’ Updated β’ 1.18k β’ 51 -
apple/OpenELM-1_1B-Instruct
Text Generation β’ 1B β’ Updated β’ 1.58M β’ 76 -
apple/OpenELM-3B-Instruct
Text Generation β’ 3B β’ Updated β’ 1.48k β’ 340
MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities.
DataCompDR: Improved datasets for training image-text SOTA models.
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper β’ 2311.17049 β’ Published β’ 8 -
apple/mobileclip_s0_timm
Image Classification β’ Updated β’ 269 β’ 12 -
apple/mobileclip_s1_timm
Image Classification β’ Updated β’ 38 β’ 3 -
apple/mobileclip_s2_timm
Image Classification β’ Updated β’ 109 β’ 6
Benchmark for the design of efficient continual learning of image-text models over years.
-
TiC-CLIP: Continual Training of CLIP Models
Paper β’ 2310.16226 β’ Published β’ 10 -
apple/TiC-DataComp
Preview β’ Updated β’ 1.76k β’ 4 -
apple/TiC-CLIP-basic-cumulative
Zero-Shot Image Classification β’ Updated β’ 39 β’ 3 -
apple/TiC-CLIP-basic-oracle
Zero-Shot Image Classification β’ Updated β’ 3 β’ 1
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
-
apple/DepthPro-hf
Depth Estimation β’ 1.0B β’ Updated β’ 41.8k β’ 107 -
apple/DepthPro
Depth Estimation β’ Updated β’ 5.42k β’ 518 -
apple/DepthPro-mixin
Depth Estimation β’ 1.0B β’ Updated β’ 21 β’ 8 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification β’ 0.4B β’ Updated β’ 12.1M β’ 2.04k
-
apple/coreml-stable-diffusion-mixed-bit-palettization
Updated β’ 10 β’ 30 -
apple/coreml-stable-diffusion-xl-base
Text-to-Image β’ Updated β’ 44 β’ 70 -
apple/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 141 β’ 55 -
pcuenq/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 14 β’ 4
CLIP Models trained using DFN-2B/DFN-5B datasets
AIM: Autoregressive Image Models
DCLM Models + Datasets