Bapt120 commited on
Commit
2676f7d
·
1 Parent(s): b6b0cc8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -29
README.md CHANGED
@@ -21,17 +21,15 @@ tags:
21
  - pdf
22
  - tables
23
  - forms
24
- - bounding-boxes
25
- - image-localization
26
  ---
27
 
28
  <div align="center">
29
- <img src="lightonocr-banner.png" alt="LightOnOCR-2-1B-bbox-base Banner" width="600"/>
30
  </div>
31
 
32
- # LightOnOCR-2-1B-bbox-base
33
 
34
- **Base model with bounding boxes for fine-tuning.** This is the pre-RLVR checkpoint that predicts both text and image bounding boxes, ideal as a starting point for domain adaptation with localization capabilities.
35
 
36
  ## Highlights
37
 
@@ -60,18 +58,6 @@ tags:
60
 
61
  ---
62
 
63
- ## Image Localization
64
-
65
- The output format for embedded images is:
66
-
67
- ```
68
- ![image](image_N.png) x1,y1,x2,y2
69
- ```
70
-
71
- Where coordinates are normalized to `[0, 1000]`.
72
-
73
- ---
74
-
75
  ## Benchmarks
76
 
77
  <div align="center">
@@ -82,7 +68,7 @@ Where coordinates are normalized to `[0, 1000]`.
82
 
83
  ---
84
 
85
- ## Installation
86
 
87
  > **Note:** LightOnOCR-2 requires transformers installed from source (not yet in a stable release).
88
 
@@ -91,10 +77,6 @@ uv pip install git+https://github.com/huggingface/transformers
91
  uv pip install pillow pypdfium2
92
  ```
93
 
94
- ---
95
-
96
- ## Usage with Transformers
97
-
98
  ```python
99
  import torch
100
  from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
@@ -102,8 +84,8 @@ from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
102
  device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"
103
  dtype = torch.float32 if device == "mps" else torch.bfloat16
104
 
105
- model = LightOnOcrForConditionalGeneration.from_pretrained("lightonai/LightOnOCR-2-1B-bbox-base", torch_dtype=dtype).to(device)
106
- processor = LightOnOcrProcessor.from_pretrained("lightonai/LightOnOCR-2-1B-bbox-base")
107
 
108
  url = "https://huggingface.co/datasets/hf-internal-testing/fixtures_ocr/resolve/main/SROIE-receipt.jpeg"
109
 
@@ -129,7 +111,7 @@ print(output_text)
129
  ## Usage with vLLM
130
 
131
  ```bash
132
- vllm serve lightonai/LightOnOCR-2-1B-bbox-base \
133
  --limit-mm-per-prompt '{"image": 1}' --mm-processor-cache-gb 0 --no-enable-prefix-caching
134
  ```
135
 
@@ -140,7 +122,7 @@ import pypdfium2 as pdfium
140
  import io
141
 
142
  ENDPOINT = "http://localhost:8000/v1/chat/completions"
143
- MODEL = "lightonai/LightOnOCR-2-1B-bbox-base"
144
 
145
  # Download PDF from arXiv
146
  pdf_url = "https://arxiv.org/pdf/2412.13663"
@@ -189,11 +171,12 @@ print(text)
189
 
190
  ## Fine-tuning
191
 
192
- LightOnOCR-2-1B-bbox-base is fully differentiable and supports:
193
 
194
  * LoRA fine-tuning
195
- * Domain adaptation with image localization requirements
196
- * Custom RLVR training with IoU-based or custom reward functions
 
197
 
198
  ---
199
 
 
21
  - pdf
22
  - tables
23
  - forms
 
 
24
  ---
25
 
26
  <div align="center">
27
+ <img src="lightonocr-banner.png" alt="LightOnOCR-2-1B-base Banner" width="600"/>
28
  </div>
29
 
30
+ # LightOnOCR-2-1B-base
31
 
32
+ **Base model for fine-tuning.** This is the pre-RLVR checkpoint with strong OCR capabilities, ideal as a starting point for domain adaptation and custom fine-tuning.
33
 
34
  ## Highlights
35
 
 
58
 
59
  ---
60
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  ## Benchmarks
62
 
63
  <div align="center">
 
68
 
69
  ---
70
 
71
+ ## Usage with Transformers
72
 
73
  > **Note:** LightOnOCR-2 requires transformers installed from source (not yet in a stable release).
74
 
 
77
  uv pip install pillow pypdfium2
78
  ```
79
 
 
 
 
 
80
  ```python
81
  import torch
82
  from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
 
84
  device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"
85
  dtype = torch.float32 if device == "mps" else torch.bfloat16
86
 
87
+ model = LightOnOcrForConditionalGeneration.from_pretrained("lightonai/LightOnOCR-2-1B-base", torch_dtype=dtype).to(device)
88
+ processor = LightOnOcrProcessor.from_pretrained("lightonai/LightOnOCR-2-1B-base")
89
 
90
  url = "https://huggingface.co/datasets/hf-internal-testing/fixtures_ocr/resolve/main/SROIE-receipt.jpeg"
91
 
 
111
  ## Usage with vLLM
112
 
113
  ```bash
114
+ vllm serve lightonai/LightOnOCR-2-1B-base \
115
  --limit-mm-per-prompt '{"image": 1}' --mm-processor-cache-gb 0 --no-enable-prefix-caching
116
  ```
117
 
 
122
  import io
123
 
124
  ENDPOINT = "http://localhost:8000/v1/chat/completions"
125
+ MODEL = "lightonai/LightOnOCR-2-1B-base"
126
 
127
  # Download PDF from arXiv
128
  pdf_url = "https://arxiv.org/pdf/2412.13663"
 
171
 
172
  ## Fine-tuning
173
 
174
+ LightOnOCR-2-1B-base is fully differentiable and supports:
175
 
176
  * LoRA fine-tuning
177
+ * Domain adaptation (receipts, scientific articles, forms, etc.)
178
+ * Multilingual fine-tuning with task-specific corpora
179
+ * Custom RLVR training with your own reward functions
180
 
181
  ---
182