Model Details

CRE-T1-SFT-preview-1202 is a text embedding model fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It is designed for semantic search and retrieval tasks, with a particular focus on reasoning-enhanced query understanding.

Model type: Text Embedding (Dual-Tower Architecture)
Language(s): English
Base Model: Qwen/Qwen2.5-1.5B-Instruct
Parameters: 1.7B
Context Length: 8,000 tokens

Model Architecture

The model adopts an asymmetric dual-tower encoding architecture to accommodate the heterogeneous characteristics of queries and documents:

Query Tower: Integrates reasoning enhancement mechanisms to deepen semantic understanding by leveraging the generative reasoning capabilities of the underlying LLM
Document Tower: Optimizes encoding efficiency to ensure high throughput in index construction

Training Details

Training Data

The model was trained using supervised fine-tuning (SFT) on retrieval-focused datasets.

Training Procedure

During the Supervised Fine-Tuning (SFT) phase, the model employs a multi-objective joint optimization strategy with the following loss function:

$L_{total} = 1 \cdot L_{SFT} + 1 \cdot L_{InfoNCE} + 2 \cdot L_{TripletMargin}$

This simultaneously optimizes:

Language Modeling (L_SFT): Maintains the generative reasoning capabilities of the base model
Contrastive Learning (L_InfoNCE): Enhances semantic discrimination between relevant and irrelevant pairs
Triplet Constraints (L_TripletMargin): Strengthens the relative positioning of query-document pairs

Key Highlights

Reasoning-Enhanced Embeddings: Leverages the generative reasoning capabilities of the LLM base to enhance query embedding representation, effectively bridging the semantic gap between original queries and target documents, thereby achieving significant improvements in reasoning capabilities for retrieval tasks.
Multi-Objective Optimization: The joint loss function ensures that the model maintains its reasoning capabilities while learning effective retrieval representations.
Asymmetric Architecture: The dual-tower design allows for specialized optimization of query and document encoders based on their distinct characteristics and usage patterns.

Evaluation

Benchmark Results

The model was evaluated on the BRIGHT benchmark.

Model	StackExchange							Coding			Theorem-based		AVG
Model	Bio.	Earth.	Econ.	Psy.	Rob.	Stack.	Sus.	Leet.	Pony	AoPS	TheoQ.	TheoT.	AVG
CRE-T1-SFT-preview-1202	48.9	47.4	26.9	42.7	26.8	30.6	31.1	16.9	5.4	3.1	21.7	34.8	28.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for CITech/CRE-T1-SFT-preview-1202

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1315)

this model