Qwen3-Embedding-0.6B (ONNX Standard / FP32)

This repository contains the unquantized (FP32), ONNX-exported version of Qwen/Qwen3-Embedding-0.6B.

It provides maximum precision and is compatible with Hugging Face's Text Embeddings Inference (TEI) or the optimum library.

Model Details

Attribute	Detail
Base Model	Qwen/Qwen3-Embedding-0.6B
Format	ONNX (Opset 17)
Quantization	None (FP32 / Standard Precision)
Task	Feature Extraction / Semantic Embedding
File Size	~2.4 GB

Usage with Text Embeddings Inference (TEI)

This model is pre-configured for TEI. Note: auto-truncate is required because the model supports 32k context, but Docker defaults to smaller batches.

Option A: Docker CLI

docker run --rm -p 8080:80 \\
    -v $PWD/data:/data \\
    ghcr.io/huggingface/text-embeddings-inference:cpu-latest \\
    --model-id Svenni551/Qwen3-Embedding-0.6B-ONNX \\
    --pooling mean \\
    --auto-truncate

Option B: Docker Compose

services:
  embedding-service:
    image: ghcr.io/huggingface/text-embeddings-inference:cpu-latest
    environment:
      - MODEL_ID=Svenni551/Qwen3-Embedding-0.6B-ONNX
      - POOLING=mean
      - MAX_CLIENT_BATCH_SIZE=8
      - MAX_BATCH_TOKENS=2048
      - AUTO_TRUNCATE=true
    volumes:
      - ./data:/data
    ports:
      - "8080:80"

Usage with Python (Optimum)

pip install optimum[onnxruntime] transformers

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import torch

model_id = "Svenni551/Qwen3-Embedding-0.6B-ONNX"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForFeatureExtraction.from_pretrained(model_id)

inputs = tokenizer("Hello World", padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# Mean Pooling
attention_mask = inputs['attention_mask']
token_embeddings = outputs.last_hidden_state
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
embeddings = torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

print(embeddings.shape)

Downloads last month: 28

Model tree for Svenni551/Qwen3-Embedding-0.6B-ONNX

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Quantized

(24)

this model