|
|
--- |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- qwen3_vl |
|
|
- trl |
|
|
- sft |
|
|
- chemistry |
|
|
- code |
|
|
- climate |
|
|
- art |
|
|
- biology |
|
|
- finance |
|
|
- legal |
|
|
- music |
|
|
- medical |
|
|
- agent |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- ab |
|
|
- aa |
|
|
- ae |
|
|
- af |
|
|
- ak |
|
|
- am |
|
|
- an |
|
|
- ar |
|
|
- as |
|
|
- av |
|
|
- ay |
|
|
- az |
|
|
- ba |
|
|
- be |
|
|
- bg |
|
|
- bh |
|
|
- bi |
|
|
- bm |
|
|
- bn |
|
|
- bo |
|
|
- br |
|
|
- bs |
|
|
- ca |
|
|
- ce |
|
|
- ch |
|
|
- co |
|
|
- cr |
|
|
- cs |
|
|
- cu |
|
|
- cv |
|
|
- cy |
|
|
- da |
|
|
- de |
|
|
- dv |
|
|
- dz |
|
|
- ee |
|
|
- el |
|
|
- eo |
|
|
- es |
|
|
- et |
|
|
- eu |
|
|
- fa |
|
|
- ff |
|
|
- fi |
|
|
- fj |
|
|
- fo |
|
|
- fr |
|
|
- fy |
|
|
- ga |
|
|
- gd |
|
|
- gl |
|
|
- gn |
|
|
- gv |
|
|
- ha |
|
|
- he |
|
|
- hi |
|
|
- ho |
|
|
- gu |
|
|
- hr |
|
|
- ht |
|
|
- hu |
|
|
- hz |
|
|
- hy |
|
|
- id |
|
|
- ia |
|
|
- ig |
|
|
- ie |
|
|
- ik |
|
|
- ii |
|
|
- is |
|
|
- io |
|
|
- iu |
|
|
- it |
|
|
- jv |
|
|
- ja |
|
|
- kg |
|
|
- ka |
|
|
- kj |
|
|
- ki |
|
|
- kl |
|
|
- kk |
|
|
- kn |
|
|
- km |
|
|
- kr |
|
|
- ko |
|
|
- ku |
|
|
- ks |
|
|
- kw |
|
|
- kv |
|
|
- la |
|
|
- ky |
|
|
- lg |
|
|
- lb |
|
|
- ln |
|
|
- li |
|
|
- lt |
|
|
- lo |
|
|
- lv |
|
|
- lu |
|
|
- mg |
|
|
- mi |
|
|
- mh |
|
|
- ml |
|
|
- mk |
|
|
- mr |
|
|
- mn |
|
|
- mt |
|
|
- ms |
|
|
- na |
|
|
- my |
|
|
- nd |
|
|
- nb |
|
|
- ng |
|
|
- nl |
|
|
- ne |
|
|
- 'no' |
|
|
- nn |
|
|
- nv |
|
|
- nr |
|
|
- oc |
|
|
- oj |
|
|
- om |
|
|
- ny |
|
|
- os |
|
|
- or |
|
|
- pa |
|
|
- pi |
|
|
- pl |
|
|
- ps |
|
|
- pt |
|
|
- rm |
|
|
- rn |
|
|
- qu |
|
|
- ro |
|
|
- ru |
|
|
- sn |
|
|
- rw |
|
|
- so |
|
|
- sa |
|
|
- sc |
|
|
- sd |
|
|
pipeline_tag: image-text-to-text |
|
|
library_name: transformers |
|
|
--- |
|
|
<img src='bannerocr.png'> |
|
|
|
|
|
# 🖼️ Next OCR 8B |
|
|
|
|
|
### *Compact OCR AI — Accurate, Fast, Multilingual, Math-Optimized* |
|
|
|
|
|
[](https://opensource.org/licenses/MIT) |
|
|
[]() |
|
|
[](https://huggingface.co/Lamapi/next-ocr) |
|
|
|
|
|
--- |
|
|
|
|
|
## 📖 Overview |
|
|
|
|
|
**Next OCR 8B** is an **8-billion parameter model** optimized for **optical character recognition (OCR) tasks** with **mathematical and tabular content understanding**. |
|
|
|
|
|
Supports **multilingual OCR** (Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, Russian...) with high accuracy, including structured documents like tables, forms, and formulas. |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚡ Highlights |
|
|
|
|
|
* 🖼️ Accurate text extraction, including math and tables |
|
|
* 🌍 Multilingual support (30+ languages) |
|
|
* ⚡ Lightweight and efficient |
|
|
* 💬 Instruction-tuned for document understanding and analysis |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Benchmark & Comparison |
|
|
|
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
| Model | OCR-Bench Accuracy (%) | Multilingual Accuracy (%) | Layout / Table Understanding (%) | |
|
|
| ------------------------------- | ------------------------ | ------------------------- | -------------------------------- | |
|
|
| **Next OCR** | **99.0** | **96.8** | **95.3** | |
|
|
| PaddleOCR | 95.2 | 93.9 | 95.3 | |
|
|
| Deepseek OCR | 90.6 | 87.4 | 86.1 | |
|
|
| Tesseract | 92.0 | 88.4 | 72.0 | |
|
|
| EasyOCR | 90.4 | 84.7 | 78.9 | |
|
|
| Google Cloud Vision / DocAI | 98.7 | 95.5 | 93.6 | |
|
|
| Amazon Textract | 94.7 | 86.2 | 86.1 | |
|
|
| Azure Document Intelligence | 95.1 | 93.6 | 91.4 | |
|
|
|
|
|
--- |
|
|
|
|
|
| Model | Handwriting (%) | Scene Text (%) | Complex Tables (%) | |
|
|
| --------------------------- | --------------- | -------------- | ------------------ | |
|
|
| **Next OCR** | 92 | 96 | 91 | |
|
|
| PaddleOCR | 88 | 92 | 90 | |
|
|
| Deepseek OCR | 80 | 85 | 83 | |
|
|
| Tesseract | 75 | 88 | 70 | |
|
|
| EasyOCR | 78 | 86 | 75 | |
|
|
| Google Cloud Vision / DocAI | 90 | 95 | 92 | |
|
|
| Amazon Textract | 85 | 90 | 88 | |
|
|
| Azure Document Intelligence | 87 | 91 | 89 | |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Installation & Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForVision2Seq |
|
|
import torch |
|
|
|
|
|
model_id = "Lamapi/next-ocr" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.float16) |
|
|
|
|
|
img = Image.open("image.jpg") |
|
|
|
|
|
# ATTENTION: The content list must include both an image and text. |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are Next-OCR, an helpful AI assistant trained by Lamapi."}, |
|
|
{ |
|
|
"role": "user", |
|
|
"content": [ |
|
|
{"type": "image", "image": img}, |
|
|
{"type": "text", "text": "Read the text in this image and summarize it."} |
|
|
] |
|
|
} |
|
|
] |
|
|
|
|
|
# Apply the chat template correctly |
|
|
prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
generated = model.generate(**inputs, max_new_tokens=256) |
|
|
|
|
|
print(processor.decode(generated[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧩 Key Features |
|
|
|
|
|
| Feature | Description | |
|
|
| -------------------------- | --------------------------------------------------------------- | |
|
|
| 🖼️ High-Accuracy OCR | Extracts text from images, documents, and screenshots reliably. | |
|
|
| 🇹🇷 Multilingual Support | Works with 30+ languages including Turkish. | |
|
|
| ⚡ Lightweight & Efficient | Optimized for resource-constrained environments. | |
|
|
| 📄 Layout & Math Awareness | Handles tables, forms, and mathematical formulas. | |
|
|
| 🏢 Reliable Outputs | Suitable for enterprise document workflows. | |
|
|
|
|
|
--- |
|
|
|
|
|
## 📐 Model Specifications |
|
|
|
|
|
| Specification | Details | |
|
|
| ----------------- | --------------------------------------------------------- | |
|
|
| **Base Model** | Qwen 3 | |
|
|
| **Parameters** | 8 Billion | |
|
|
| **Architecture** | Vision + Transformer (OCR LLM) | |
|
|
| **Modalities** | Image-to-text | |
|
|
| **Fine-Tuning** | OCR datasets with multilingual and math/tabular content | |
|
|
| **Optimizations** | Quantization-ready, FP16 support | |
|
|
| **Primary Focus** | Text extraction, document understanding, mathematical OCR | |
|
|
|
|
|
--- |
|
|
|
|
|
## 🎯 Ideal Use Cases |
|
|
|
|
|
* Document digitization |
|
|
* Invoice & receipt processing |
|
|
* Multilingual OCR pipelines |
|
|
* Tables, forms, and formulas extraction |
|
|
* Enterprise document management |
|
|
|
|
|
--- |
|
|
|
|
|
## 📄 License |
|
|
|
|
|
MIT License — free for commercial & non-commercial use. |
|
|
|
|
|
--- |
|
|
|
|
|
## 📞 Contact & Support |
|
|
|
|
|
* 📧 Email: [lamapicontact@gmail.com](mailto:lamapicontact@gmail.com) |
|
|
* 🤗 HuggingFace: [Lamapi](https://huggingface.co/Lamapi) |
|
|
|
|
|
--- |
|
|
|
|
|
> **Next OCR** — Compact *OCR + math-capable* AI, blending **accuracy**, **speed**, and **multilingual document intelligence**. |
|
|
|
|
|
[](https://huggingface.co/Lamapi) |