hanxiao commited on
Commit
ea73f5d
·
1 Parent(s): c2bea19

Remove placeholder links, add CLI example with input/output table

Browse files
Files changed (1) hide show
  1. README.md +35 -1
README.md CHANGED
@@ -25,7 +25,7 @@ inference: false
25
 
26
  # jina-vlm-v1: Small Multilingual Vision Language Model
27
 
28
- [Blog](https://jina.ai/news/jina-vlm-v1) | [API](https://jina.ai/api) | [Arxiv](https://arxiv.org/abs/2512.04032)
29
 
30
  `jina-vlm-v1` is a 2.4B parameter vision-language model that achieves state-of-the-art multilingual visual question answering among open 2B-scale VLMs. The model couples a SigLIP2 vision encoder with a Qwen3 language backbone through an attention-pooling connector that enables token-efficient processing of arbitrary-resolution images.
31
 
@@ -81,6 +81,40 @@ python infer.py -p "What is the capital of France?"
81
  - `--max-pixels`: Max pixels per image, larger images are resized preserving aspect ratio.
82
  - `--stream`: Enable streaming output.
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ### Using Transformers
85
 
86
  ```python
 
25
 
26
  # jina-vlm-v1: Small Multilingual Vision Language Model
27
 
28
+ Blog | API | [Arxiv](https://arxiv.org/abs/2512.04032)
29
 
30
  `jina-vlm-v1` is a 2.4B parameter vision-language model that achieves state-of-the-art multilingual visual question answering among open 2B-scale VLMs. The model couples a SigLIP2 vision encoder with a Qwen3 language backbone through an attention-pooling connector that enables token-efficient processing of arbitrary-resolution images.
31
 
 
81
  - `--max-pixels`: Max pixels per image, larger images are resized preserving aspect ratio.
82
  - `--stream`: Enable streaming output.
83
 
84
+ **Example:**
85
+
86
+ ```bash
87
+ python infer.py -i assets/the_persistence_of_memory.jpg -p "Describe this picture"
88
+ ```
89
+
90
+ <table>
91
+ <tr>
92
+ <td width="40%"><b>Input</b></td>
93
+ <td width="60%"><b>Output</b></td>
94
+ </tr>
95
+ <tr>
96
+ <td><img src="./assets/the_persistence_of_memory.jpg" width="100%"></td>
97
+ <td>
98
+
99
+ ```
100
+ ├── 🖼️ Images: ['the_persistence_of_memory.jpg']
101
+ ├── 📜 Prompt: Describe this picture
102
+ └── 🧠 Response: This image is a surrealistic
103
+ painting by Salvador Dalí, titled "The Persistence
104
+ of Memory." The painting is characterized by its
105
+ dreamlike and distorted elements, which are
106
+ hallmarks of Dalí's style. The central focus of
107
+ the painting is a melting clock, which is a key
108
+ symbol in the artwork...
109
+
110
+ Token usage: 1753 tokens (4.3%)
111
+ Generated in 33.08s | 8.16 tok/s
112
+ ```
113
+
114
+ </td>
115
+ </tr>
116
+ </table>
117
+
118
  ### Using Transformers
119
 
120
  ```python