---
license: cc-by-nc-4.0
datasets:
- uoft-cs/cifar10
language:
- en
base_model:
- facebook/metaclip-2-worldwide-s16
pipeline_tag: image-classification
library_name: transformers
tags:
- text-generation-inference
- cifar10
---

![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/mZz2vZy1IENHbtmXm1lUe.png)

# **MetaCLIP-2-Cifar10**

> **MetaCLIP-2-Cifar10** is an image classification vision–language encoder model fine-tuned from **facebook/metaclip-2-worldwide-s16** for a single-label classification task.
> It is designed to identify and categorize images into the ten CIFAR-10 object classes using the **MetaClip2ForImageClassification** architecture.

>[!note]
MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062

```
Classification report:

              precision    recall  f1-score   support

    airplane     0.9813    0.9685    0.9748      2000
  automobile     0.9777    0.9850    0.9813      2000
        bird     0.9560    0.9560    0.9560      2000
         cat     0.9104    0.9395    0.9247      2000
        deer     0.9566    0.9580    0.9573      2000
         dog     0.9476    0.9215    0.9343      2000
        frog     0.9774    0.9735    0.9755      2000
       horse     0.9704    0.9670    0.9687      2000
        ship     0.9782    0.9890    0.9836      2000
       truck     0.9774    0.9735    0.9755      2000

    accuracy                         0.9631     20000
   macro avg     0.9633    0.9632    0.9632     20000
weighted avg     0.9633    0.9631    0.9632     20000
```

![download](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/dr7B2yAcfNEJ6ScY6XNC5.png)

---

The model classifies images into the following categories:

* **Class 0:** airplane
* **Class 1:** automobile
* **Class 2:** bird
* **Class 3:** cat
* **Class 4:** deer
* **Class 5:** dog
* **Class 6:** frog
* **Class 7:** horse
* **Class 8:** ship
* **Class 9:** truck

# **Run with Transformers**

```python
!pip install -q transformers torch pillow gradio
```

```python
import gradio as gr
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/MetaCLIP-2-Cifar10"
model = AutoModelForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

def cifar10_classification(image):
    """Predicts the CIFAR-10 class represented in an image."""
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    labels = {
        "0": "airplane",
        "1": "automobile",
        "2": "bird",
        "3": "cat",
        "4": "deer",
        "5": "dog",
        "6": "frog",
        "7": "horse",
        "8": "ship",
        "9": "truck"
    }
    predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}

    return predictions

# Create Gradio interface
iface = gr.Interface(
    fn=cifar10_classification,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(label="Prediction Scores"),
    title="CIFAR-10 Classification",
    description="Upload an image to classify it into one of the CIFAR-10 categories."
)

# Launch the app
if __name__ == "__main__":
    iface.launch()
```

# **Sample Inference:**

![Screenshot 2025-11-15 at 08-21-23 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/vPnT4-Imqykvjll9t5aYC.png)
![Screenshot 2025-11-15 at 08-26-25 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/1vRKZKk8mWIhw4IV_DZYV.png)
![Screenshot 2025-11-15 at 08-22-10 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/72idt8H-cjX2pLOOTgNxZ.png)
![Screenshot 2025-11-15 at 08-22-41 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/VEE08FlRAaSzCaOyq6135.png)
![Screenshot 2025-11-15 at 08-23-53 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/SFjNL9AIkL0myJ2HSrjfk.png)
![Screenshot 2025-11-15 at 08-24-30 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/6M8Z5PlbD1QSJ5Sbdo1u-.png)
![Screenshot 2025-11-15 at 08-25-04 CIFAR-10 Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/jNv67l2-M3c_TYmwGg25f.png)

# **Intended Use:**

The **MetaCLIP-2-Cifar10** model is designed for object classification across the ten CIFAR-10 categories.
Potential use cases include:

* **Educational & Research Applications:** Benchmarking experiments, model comparison, and deep learning studies.
* **Lightweight Vision Systems:** Useful for systems requiring simple object recognition.
* **Dataset Exploration:** Assisting in data inspection, annotation, and visualization.
* **Prototype Systems:** Ideal for rapid prototyping in classification pipelines.