Spaces:

satvikjain
/

AdvancedOCR

Build error

App Files Files Community

AdvancedOCR / README.md

satvikjain

fix: README

8287cb2 2 months ago

preview code

raw

history blame contribute delete

1.58 kB

metadata

title: PDF OCR (Detectron2 + TrOCR)
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

PDF OCR (Detectron2 + TrOCR) - Hugging Face Spaces

This repo contains a deployable Gradio app that detects text lines with Detectron2 and reads them with TrOCR. Optional Gemini correction can refine the text.

Files

app.py: Gradio UI
inference.py: OCR pipeline (Detectron2 + TrOCR)
requirements.txt: Python dependencies (Detectron2 installed in Dockerfile)
Dockerfile: CUDA-enabled image for GPU Space
model_final.pth: Detectron2 weights

Deploy on Hugging Face Spaces (Docker Space)

Create a new Space on Hugging Face → Type: Docker → Hardware: GPU (T4/A10G).
Push these files to the Space repository (or connect this folder and git push).
Set optional secret: GEMINI_API_KEY (for correction) in Space Settings → Secrets.
Wait for the build to finish. The app will start on port 7860.

Use

Upload a PDF.
(Optional) Toggle Split-page (currently standard pipeline is used) and Gemini correction.
Click Process.
Download the ZIP of per-page JSONs. The full combined text is shown in the textbox.

Local run (GPU recommended)

docker build -t ocr-app .
docker run --gpus all -p 7860:7860 ocr-app

Then open http://localhost:7860

Notes

Detectron2 requires GPU for reasonable speed; CPU will be slow.
TEXTLINE_MODEL_PATH can be overridden via env var if the weights are elsewhere.
TrOCR models are downloaded on first run and cached in the container layer after warmup.