DINOv2 Fine-tuned on ImageNet-1k

This model is a fine-tuned version of facebook/dinov2-base on ImageNet-1k dataset.

Model Details

  • Base Model: DINOv2 ViT-Base/14
  • Fine-tuned on: ImageNet-1k (1000 classes)
  • Parameters: ~86M
  • Input Size: 224x224

Usage

import torch
import timm
from PIL import Image

# Load model
model_id = "vit_base_patch14_reg4_dinov2.lvd142m"
model = timm.create_model(model_id, pretrained=False)
model.head = torch.nn.Linear(768, 1000, bias=True)

# Load weights
checkpoint = torch.load("pytorch_model.bin")
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Inference
data_config = timm.data.resolve_data_config(model.pretrained_cfg)
transforms = timm.data.create_transform(**data_config)

image = Image.open("your_image.jpg")
input_tensor = transforms(image).unsqueeze(0)

with torch.no_grad():
    output = model(input_tensor)
    probabilities = torch.nn.functional.softmax(output[0], dim=0)

Training Details

  • Training samples: 15,000
  • Validation samples: 5,000
  • Epochs: 20
  • Optimizer: AdamW
  • Learning Rate: 0.001

Citation

@article{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}
Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SasikaA073/vit_base_patch14_reg4_dinov2.lvd142m_sp_ft_in1k