--- license: cc-by-nc-4.0 datasets: - uoft-cs/cifar10 language: - en base_model: - facebook/metaclip-2-worldwide-s16 pipeline_tag: image-classification library_name: transformers tags: - text-generation-inference - cifar10 --- ![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/mZz2vZy1IENHbtmXm1lUe.png) # **MetaCLIP-2-Cifar10** > **MetaCLIP-2-Cifar10** is an image classification vision–language encoder model fine-tuned from **facebook/metaclip-2-worldwide-s16** for a single-label classification task. > It is designed to identify and categorize images into the ten CIFAR-10 object classes using the **MetaClip2ForImageClassification** architecture. >[!note] MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062 ``` Classification report: precision recall f1-score support airplane 0.9813 0.9685 0.9748 2000 automobile 0.9777 0.9850 0.9813 2000 bird 0.9560 0.9560 0.9560 2000 cat 0.9104 0.9395 0.9247 2000 deer 0.9566 0.9580 0.9573 2000 dog 0.9476 0.9215 0.9343 2000 frog 0.9774 0.9735 0.9755 2000 horse 0.9704 0.9670 0.9687 2000 ship 0.9782 0.9890 0.9836 2000 truck 0.9774 0.9735 0.9755 2000 accuracy 0.9631 20000 macro avg 0.9633 0.9632 0.9632 20000 weighted avg 0.9633 0.9631 0.9632 20000 ``` ![download](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/dr7B2yAcfNEJ6ScY6XNC5.png) --- The model classifies images into the following categories: * **Class 0:** airplane * **Class 1:** automobile * **Class 2:** bird * **Class 3:** cat * **Class 4:** deer * **Class 5:** dog * **Class 6:** frog * **Class 7:** horse * **Class 8:** ship * **Class 9:** truck # **Run with Transformers** ```python !pip install -q transformers torch pillow gradio ``` ```python import gradio as gr from transformers import AutoImageProcessor from transformers import AutoModelForImageClassification from transformers.image_utils import load_image from PIL import Image import torch # Load model and processor model_name = "prithivMLmods/MetaCLIP-2-Cifar10" model = AutoModelForImageClassification.from_pretrained(model_name) processor = AutoImageProcessor.from_pretrained(model_name) def cifar10_classification(image): """Predicts the CIFAR-10 class represented in an image.""" image = Image.fromarray(image).convert("RGB") inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() labels = { "0": "airplane", "1": "automobile", "2": "bird", "3": "cat", "4": "deer", "5": "dog", "6": "frog", "7": "horse", "8": "ship", "9": "truck" } predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))} return predictions # Create Gradio interface iface = gr.Interface( fn=cifar10_classification, inputs=gr.Image(type="numpy"), outputs=gr.Label(label="Prediction Scores"), title="CIFAR-10 Classification", description="Upload an image to classify it into one of the CIFAR-10 categories." ) # Launch the app if __name__ == "__main__": iface.launch() ``` # **Sample Inference:** ![Screenshot 2025-11-15 at 08-21-23 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/vPnT4-Imqykvjll9t5aYC.png) ![Screenshot 2025-11-15 at 08-26-25 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/1vRKZKk8mWIhw4IV_DZYV.png) ![Screenshot 2025-11-15 at 08-22-10 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/72idt8H-cjX2pLOOTgNxZ.png) ![Screenshot 2025-11-15 at 08-22-41 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/VEE08FlRAaSzCaOyq6135.png) ![Screenshot 2025-11-15 at 08-23-53 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/SFjNL9AIkL0myJ2HSrjfk.png) ![Screenshot 2025-11-15 at 08-24-30 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/6M8Z5PlbD1QSJ5Sbdo1u-.png) ![Screenshot 2025-11-15 at 08-25-04 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/jNv67l2-M3c_TYmwGg25f.png) # **Intended Use:** The **MetaCLIP-2-Cifar10** model is designed for object classification across the ten CIFAR-10 categories. Potential use cases include: * **Educational & Research Applications:** Benchmarking experiments, model comparison, and deep learning studies. * **Lightweight Vision Systems:** Useful for systems requiring simple object recognition. * **Dataset Exploration:** Assisting in data inspection, annotation, and visualization. * **Prototype Systems:** Ideal for rapid prototyping in classification pipelines.