Model Details

CRE-T1-SFT-preview-1202 is a text embedding model fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It is designed for semantic search and retrieval tasks, with a particular focus on reasoning-enhanced query understanding.

  • Model type: Text Embedding (Dual-Tower Architecture)
  • Language(s): English
  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Parameters: 1.7B
  • Context Length: 8,000 tokens

Model Architecture

The model adopts an asymmetric dual-tower encoding architecture to accommodate the heterogeneous characteristics of queries and documents:

  • Query Tower: Integrates reasoning enhancement mechanisms to deepen semantic understanding by leveraging the generative reasoning capabilities of the underlying LLM
  • Document Tower: Optimizes encoding efficiency to ensure high throughput in index construction

Training Details

Training Data

The model was trained using supervised fine-tuning (SFT) on retrieval-focused datasets.

Training Procedure

During the Supervised Fine-Tuning (SFT) phase, the model employs a multi-objective joint optimization strategy with the following loss function:

Ltotal=1β‹…LSFT+1β‹…LInfoNCE+2β‹…LTripletMarginL_{total} = 1 \cdot L_{SFT} + 1 \cdot L_{InfoNCE} + 2 \cdot L_{TripletMargin}

This simultaneously optimizes:

  • Language Modeling (L_SFT): Maintains the generative reasoning capabilities of the base model
  • Contrastive Learning (L_InfoNCE): Enhances semantic discrimination between relevant and irrelevant pairs
  • Triplet Constraints (L_TripletMargin): Strengthens the relative positioning of query-document pairs

Key Highlights

  1. Reasoning-Enhanced Embeddings: Leverages the generative reasoning capabilities of the LLM base to enhance query embedding representation, effectively bridging the semantic gap between original queries and target documents, thereby achieving significant improvements in reasoning capabilities for retrieval tasks.

  2. Multi-Objective Optimization: The joint loss function ensures that the model maintains its reasoning capabilities while learning effective retrieval representations.

  3. Asymmetric Architecture: The dual-tower design allows for specialized optimization of query and document encoders based on their distinct characteristics and usage patterns.

Evaluation

Benchmark Results

The model was evaluated on the BRIGHT benchmark.

Model StackExchange Coding Theorem-based AVG
Bio. Earth. Econ. Psy. Rob. Stack. Sus. Leet. Pony AoPS TheoQ. TheoT.
CRE-T1-SFT-preview-1202 48.9 47.4 26.9 42.7 26.8 30.6 31.1 16.9 5.4 3.1 21.7 34.8 28.0
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CITech/CRE-T1-SFT-preview-1202

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1315)
this model