| | --- |
| | language: en |
| | license: gpl-3.0 |
| | library_name: transformers |
| | tags: |
| | - vision |
| | - image-classification |
| | - resnet |
| | - pruning |
| | - sparse |
| | base_model: microsoft/resnet-50 |
| | pipeline_tag: image-classification |
| | datasets: |
| | - ILSVRC/imagenet-1k |
| | metrics: |
| | - accuracy |
| | --- |
| | |
| | # ModHiFi Pruned ResNet-50 (Small) |
| |
|
| | ## Model Description |
| |
|
| | This model is a **structurally pruned** version of the standard [ResNet-50](https://huggingface.co/microsoft/resnet-50) architecture. |
| | Developed by the **Machine Learning Lab at the Indian Institute of Science**, it has been compressed to remove **~30% of the parameters** while achieving *higher accuracy* than the base model. |
| |
|
| | Unlike unstructured pruning (which zeros out weights), **structural pruning** physically removes entire channels and filters. |
| | This results in a model that is natively **smaller, faster, and reduces FLOPs** on standard hardware without needing specialized sparse inference engines. |
| |
|
| | - **Developed by:** Machine Learning Lab, Indian Institute of Science |
| | - **Model type:** Convolutional Neural Network (Pruned ResNet) |
| | - **License:** GNU General Public License v3.0 |
| | - **Base Model:** Microsoft ResNet-50 |
| |
|
| | ## Performance & Efficiency |
| |
|
| | | Model Variant | Sparsity | Top-1 Acc | Top-5 Acc | Params (M) | FLOPs (G) | Size (MB) | |
| | | :--- | :---: | :---: | :---: | :---: | :---: | :---: | |
| | | **Original ResNet-50** | 0% | 76.13% | 92.86% | 25.56 | 4.12 | ~98 | |
| | | **ModHiFi-Small** | **~32%** | **76.70%** | **93.32%** | **17.4** | **1.9** | **~66** | |
| |
|
| | On the hardware we test on (detailed in our [Paper](https://arxiv.org/abs/2511.19566)) we observe speedups of **1.69x on CPUs** and **1.70x on GPUs**. |
| |
|
| | > **Note:** "FLOPs" measures the number of floating-point operations required for a single inference pass. Lower is better for latency and battery life. |
| |
|
| | ## ⚠️ Critical Note on Preprocessing & Accuracy |
| |
|
| | **Please Read Before Evaluating:** This model was trained and evaluated using standard PyTorch `torchvision.transforms`. The Hugging Face `pipeline` uses `PIL` (Pillow) for image resizing by default. |
| |
|
| | Due to subtle differences in interpolation (Bilinear vs. Bicubic) and anti-aliasing between PyTorch's C++ kernels and PIL, **you may observe a ~0.5% - 1.0% drop in Top-1 accuracy** if you use the default `preprocessor_config.json`. |
| |
|
| | To reproduce the exact numbers listed in the table above, we recommend wrapping the `pipeline` with the exact PyTorch transforms used during training: |
| |
|
| | ```python |
| | from torchvision import transforms |
| | from transformers import pipeline |
| | import torch |
| | |
| | # 1. Define the Exact PyTorch Transform |
| | val_transform = transforms.Compose([ |
| | transforms.Resize(256), # Resize shortest edge to 256 |
| | transforms.CenterCrop(224), # Center crop 224x224 |
| | transforms.ToTensor(), # Convert to Tensor (0-1) |
| | transforms.Normalize( # ImageNet Normalization |
| | mean=[0.485, 0.456, 0.406], |
| | std=[0.229, 0.224, 0.225] |
| | ), |
| | ]) |
| | |
| | # 2. Define a Wrapper to force Pipeline to use PyTorch |
| | class PyTorchProcessor: |
| | def __init__(self, transform): |
| | self.transform = transform |
| | self.image_processor_type = "custom" |
| | |
| | def __call__(self, images, **kwargs): |
| | if not isinstance(images, list): images = [images] |
| | # Apply transforms and stack |
| | pixel_values = torch.stack([self.transform(img.convert("RGB")) for img in images]) |
| | return {"pixel_values": pixel_values} |
| | |
| | # 3. Initialize Pipeline with Custom Processor |
| | pipe = pipeline( |
| | "image-classification", |
| | model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small", |
| | image_processor=PyTorchProcessor(val_transform), # <--- Fixes the accuracy gap |
| | trust_remote_code=True, |
| | device=0 # Use GPU if available |
| | ) |
| | ``` |
| |
|
| | ## Quick Start |
| |
|
| | If you do not require bit-perfect reproduction of the original accuracy and prefer simplicity, you can use the model directly with the standard Hugging Face pipeline. |
| |
|
| | ### Install dependencies |
| |
|
| | ```bash |
| | pip install torch transformers |
| | ``` |
| |
|
| | ## Inference example |
| |
|
| | ```python |
| | import requests |
| | from PIL import Image |
| | from transformers import pipeline |
| | |
| | # Load model (ensure trust_remote_code=True for custom architecture) |
| | pipe = pipeline( |
| | "image-classification", |
| | model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small", |
| | trust_remote_code=True |
| | ) |
| | |
| | # Load an image |
| | url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
| | image = Image.open(requests.get(url, stream=True).raw) |
| | |
| | # Run Inference |
| | results = pipe(image) |
| | print(f"Predicted Class: {results[0]['label']}") |
| | print(f"Confidence: {results[0]['score']:.4f}") |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | If you use this model in your research, please cite the following paper: |
| |
|
| | ``` |
| | @inproceedings{kashyap2026modhifi, |
| | title = {ModHiFi: Identifying High Fidelity predictive components for Model Modification}, |
| | author = {Kashyap, Dhruva and Murti, Chaitanya and Nayak, Pranav and Narshana, Tanay and Bhattacharyya, Chiranjib}, |
| | booktitle = {Advances in Neural Information Processing Systems}, |
| | year = {2025}, |
| | eprint = {2511.19566}, |
| | archivePrefix = {arXiv}, |
| | primaryClass = {cs.LG}, |
| | url = {https://arxiv.org/abs/2511.19566}, |
| | } |
| | ``` |
| |
|