AdvancedOCR / README.md
satvikjain's picture
fix: README
8287cb2
metadata
title: PDF OCR (Detectron2 + TrOCR)
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

PDF OCR (Detectron2 + TrOCR) - Hugging Face Spaces

This repo contains a deployable Gradio app that detects text lines with Detectron2 and reads them with TrOCR. Optional Gemini correction can refine the text.

Files

  • app.py: Gradio UI
  • inference.py: OCR pipeline (Detectron2 + TrOCR)
  • requirements.txt: Python dependencies (Detectron2 installed in Dockerfile)
  • Dockerfile: CUDA-enabled image for GPU Space
  • model_final.pth: Detectron2 weights

Deploy on Hugging Face Spaces (Docker Space)

  1. Create a new Space on Hugging Face → Type: Docker → Hardware: GPU (T4/A10G).
  2. Push these files to the Space repository (or connect this folder and git push).
  3. Set optional secret: GEMINI_API_KEY (for correction) in Space Settings → Secrets.
  4. Wait for the build to finish. The app will start on port 7860.

Use

  1. Upload a PDF.
  2. (Optional) Toggle Split-page (currently standard pipeline is used) and Gemini correction.
  3. Click Process.
  4. Download the ZIP of per-page JSONs. The full combined text is shown in the textbox.

Local run (GPU recommended)

docker build -t ocr-app .
docker run --gpus all -p 7860:7860 ocr-app

Then open http://localhost:7860

Notes

  • Detectron2 requires GPU for reasonable speed; CPU will be slow.
  • TEXTLINE_MODEL_PATH can be overridden via env var if the weights are elsewhere.
  • TrOCR models are downloaded on first run and cached in the container layer after warmup.