MedSwin/MedSwin-Reranker-bge-gemma — Fine-tuned Biomedical & EMR Context Ranking

Overview

  1. RAG Context Reranking
    Re-rank candidate passages retrieved from a VectorDB (initial recall via embeddings), improving final context selection for downstream medical LLM reasoning.

  2. EMR Profile Reranking
    Re-rank patient historical information (e.g., past assessments, diagnoses, medications) to surface the most clinically relevant records for a given current assessment.

The reranker outputs a direct relevance score for each (query, passage) pair and can be used as a drop-in “second-stage” ranking component after embedding-based retrieval.


Why a Reranker?

Embedding retrieval is fast and scalable but may miss nuanced relevance (clinical relationships, subtle terminology, long context dependencies).
A reranker improves precision by explicitly scoring each candidate passage against the query, typically yielding better top-k context for medical QA and decision support.


Base Model

  • Model: BAAI/bge-reranker-v2-gemma
  • Finetuning strategy: LoRA (parameter-efficient fine-tuning) with gradient checkpointing and mixed precision (fp16/bf16 depending on GPU).
  • Rationale: Gemma-based rerankers generally provide strong relevance modeling and support longer contexts compared to smaller rerankers.

Training Data (Offline, Local)

We fine-tune using open HF datasets stored locally on HPC:

1) BioASQ (Generated Queries)

  • Used as: (query, document) positives; negatives sampled from rolling buffer.
  • Specialised to handle the complex terminology and high precision required for Task B (Biomedical Semantic QA). The reranker acts as a critical second stage in a two-stage retrieval system, filtering initial candidate lists from a PubMed-indexed retriever to ensure the highest-ranked documents contain the specific evidence needed for factoid and 'ideal' answer generation.

2) MIRIAD (Medical IR Instruction Dataset)

  • Used as: (question → passage) positives; negatives sampled from rolling buffer.
  • MIRIAD's 4.4M literature-grounded QA pairs, the model is trained to distinguish between highly similar clinical concepts. This specialization reduces medical hallucinations and ensures that the most scientifically accurate evidence is prioritised in a multi-stage retrieval pipeline for healthcare professionals.

3) SciDocs

  • Multi-task dataset—including citation prediction and co-citation analysis—the model learns to capture nuanced semantic relationships that standard Bi-Encoders miss. The resulting reranker serves as a high-accuracy second stage in a two-stage retrieval pipeline, significantly improving Top-K relevance for complex scholarly queries.

Methodology

Data Construction (Triplets)

The training corpus is converted into reranker triplets:

{
  "query": "clinical question",
  "pos": ["relevant passage 1", "relevant passage 2"],
  "neg": ["irrelevant passage A", "irrelevant passage B"],
  "source": "dataset_name"
}
  • Positives: from dataset relevance labels or paired question–passage examples.
  • Negatives: sampled from an in-memory rolling buffer (fast, scalable offline).
  • Output splits: train / val / test created in one run.

Evaluation

Computes IR ranking metrics by scoring each query against its (pos + neg) candidates:

  • nDCG@10: 0.60+
  • MRR@10: 0.50+
  • MAP@10: 0.40+
  • Hit@1: 0.40+
  • Metrics reported overall and broken down by data source.
Downloads last month
20
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for MedSwin/MedSwin-Reranker-bge-gemma

Finetuned
(2)
this model

Datasets used to train MedSwin/MedSwin-Reranker-bge-gemma

Collection including MedSwin/MedSwin-Reranker-bge-gemma