Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-ImgAnalysis

Model Details

Model Description

Loyca-Qwen3-VL-2B-Instruct-ImgAnalysis is a lightweight LoRA adapter built on top of Qwen/Qwen3-VL-2B-Instruct, fine-tuned for screen content understanding.
It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs.

Model Sources


Uses

This model can be used for captioning or summarizing visual elements that contain textual information, and extracting structured text (e.g., app names, UI labels, error messages) from software screenshots.

The model is not designed for:

  • Handwritten OCR
  • Scene text in natural environments (e.g., street signs)
  • Legal or financial document processing without human review

Training Details

Training Data

The model was trained on Vokturz/sourceforge-app-screenshots-ocr (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements. The dataset focuses on clean UI text extraction rather than general image captioning.

Training Hyperparameters

Parameter Value
Epochs 6
Batch size 8
Learning rate 2.5e-4
LoRA rank 64
LoRA alpha 64
Precision bfloat16 (mixed)
Optimizer AdamW
Scheduler Cosine decay
Gradient accumulation 2
Weight decay 0.01
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vokturz/Loyca-Qwen3-VL-2B-Instruct-ImgAnalysis

Dataset used to train Vokturz/Loyca-Qwen3-VL-2B-Instruct-ImgAnalysis

Collection including Vokturz/Loyca-Qwen3-VL-2B-Instruct-ImgAnalysis