Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-ImgAnalysis
Model Details
Model Description
Loyca-Qwen3-VL-2B-Instruct-ImgAnalysis is a lightweight LoRA adapter built on top of Qwen/Qwen3-VL-2B-Instruct, fine-tuned for screen content understanding.
It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs.
Model Sources
- Repository: https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-ImgAnalysis
- Base model: https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct
- Fine-tuning run: W&B Experiment
Uses
This model can be used for captioning or summarizing visual elements that contain textual information, and extracting structured text (e.g., app names, UI labels, error messages) from software screenshots.
The model is not designed for:
- Handwritten OCR
- Scene text in natural environments (e.g., street signs)
- Legal or financial document processing without human review
Training Details
Training Data
The model was trained on Vokturz/sourceforge-app-screenshots-ocr (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements.
The dataset focuses on clean UI text extraction rather than general image captioning.
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 6 |
| Batch size | 8 |
| Learning rate | 2.5e-4 |
| LoRA rank | 64 |
| LoRA alpha | 64 |
| Precision | bfloat16 (mixed) |
| Optimizer | AdamW |
| Scheduler | Cosine decay |
| Gradient accumulation | 2 |
| Weight decay | 0.01 |
Model tree for Vokturz/Loyca-Qwen3-VL-2B-Instruct-ImgAnalysis
Base model
Qwen/Qwen3-VL-2B-Instruct