SentenceTransformer
ModernBERT-small-v2 represents an efficient approach to creating highly efficient and accurate dense vector encoders. It leverages a small ModernBERT architecture, simple MLM training, and distillation from a larger performant model to achieve superior performance at a lower computational cost compared to standard large models.
Key Features & Training Methodology
This model was created using a specialized four-stage pipeline:
Deep & Narrow Architecture: Unlike typical small models (e.g., 6 layers), this student model features 12 Transformer layers but operates within a narrow 384-dimensional embedding space. This depth allows for complex multi-hop reasoning crucial for high-accuracy retrieval tasks, while the narrow dimension ensures extremely fast encoding and small index sizes.
Guided Initialization (GUIDE): The model did not start from random weights. It inherited structural and semantic knowledge from a larger teacher model (
answerdotai/ModernBERT-base) via Principal Component Analysis (PCA) Projection. This technique surgically compressed the teacher's 768-dimensional knowledge into the student's 384-dimensional space, providing a massive "head start."Extensive MLM Pre-training: Following initialization, the model underwent comprehensive Masked Language Modeling (MLM) pre-training on a highly diverse corpus combining:
- Search Data (MS MARCO)
- Academic Texts (Stanford Philosophy)
- General Knowledge (NPR, FineWiki)
Knowledge Distillation (STS Tuning): The final, critical stage optimized the model for semantic similarity. It was trained to mimic the output embeddings of a powerful Retrieval Teacher (
Alibaba-NLP/gte-modernbert-base) using Mean Squared Error (MSE) loss. This specialized tuning ensures its 384-dimensional vectors excel at similarity and retrieval tasks.
Training
The final model, ModernBERT-small-v2, was trained using a curated combination of four distinct datasets during the MLM Pre-training phase to ensure broad general knowledge acquisition before the final distillation tuning.
The following datasets were integrated and processed:
- MS MARCO Triplets (
sentence-transformers/msmarco-msmarco-MiniLM-L6-v3, "triplet" split)- Source Focus: Query/Document ranking (Search Relevance).
- Stanford Encyclopedia of Philosophy Triplets (
johnnyboycurtis/Philosophical-Triplets-Retrieval)- Source Focus: Deep, technical, and abstract academic reasoning.
- NPR Articles (
sentence-transformers/npr)- Source Focus: Modern news, journalistic style, and general current events.
- FineWiki (English) (
HuggingFaceFW/finewiki, "en" split)- Source Focus: Encyclopedic, factual knowledge spanning a wide range of topics.
- Only used in distillation training; not used in MLM.
(Note: During the final Knowledge Distillation phase, the targets were generated using embeddings from the teacher model (Alibaba-NLP/gte-modernbert-base) based on the combined text content of this merged corpus.)
Model Details
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: 1024 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- parquet
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
import torch
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("johnnyboycurtis/ModernBERT-small-v2", model_kwargs={"attn_implementation": "flash_attention_2", "dtype": torch.bfloat16}) # or use "sdpa"
# Run inference
sentences = [
'# Breda Holmes\nBreda Holmes is a former camogie player, winner of the B+I Star of the Year award in 1987 and seven All Ireland medals in succession between 1984 and 1991, celebrating the seventh by scoring the match-turning goal from Ann Downey’s sideline ball against Cork in the 1991 final.\n\n## Career\nShe captained Carysfort Training College in their 1984 Purcell Cup campaign and won six All Ireland club medals with St Paul’s camogie club, based in Kilkenny city.\n',
'What is Intellectual Property? Intellectual property (IP) refers to creations of the mind, such as inventions; literary and artistic works; designs; and symbols, names and images used in commerce. IP is protected in law by, for example, patents, copyright and trademarks, which enable people to earn recognition or financial benefit from what they invent or create.',
'10 Most Famous Soccer Stadiums in the World. The Camp Nou with its capacity of 99,354 is the largest stadium in Europe and also the fourth largest soccer stadium in the world. It is situated in Barcelona, Catalonia, Spain, and is the home of Spanish club Barcelona since 1957.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.2616, 0.5490],
# [0.2616, 1.0000, 0.3196],
# [0.5490, 0.3196, 1.0000]])
Evaluation
Metrics
Knowledge Distillation
- Dataset:
mse-dev - Evaluated with
MSEEvaluator
| Metric | Value |
|---|---|
| negative_mse | -77.74 |
Information Retrieval
- Datasets:
NanoMSMARCOandNanoHotpotQA - Evaluated with
InformationRetrievalEvaluator
| Metric | NanoMSMARCO | NanoHotpotQA |
|---|---|---|
| cosine_accuracy@1 | 0.32 | 0.52 |
| cosine_accuracy@3 | 0.52 | 0.76 |
| cosine_accuracy@5 | 0.6 | 0.78 |
| cosine_accuracy@10 | 0.76 | 0.84 |
| cosine_precision@1 | 0.32 | 0.52 |
| cosine_precision@3 | 0.1733 | 0.3333 |
| cosine_precision@5 | 0.12 | 0.22 |
| cosine_precision@10 | 0.076 | 0.122 |
| cosine_recall@1 | 0.32 | 0.26 |
| cosine_recall@3 | 0.52 | 0.5 |
| cosine_recall@5 | 0.6 | 0.55 |
| cosine_recall@10 | 0.76 | 0.61 |
| cosine_ndcg@10 | 0.5251 | 0.5457 |
| cosine_mrr@10 | 0.4523 | 0.6494 |
| cosine_map@100 | 0.4624 | 0.4736 |
Nano BEIR
- Dataset:
NanoBEIR_mean - Evaluated with
NanoBEIREvaluatorwith these parameters:{ "dataset_names": [ "MSMARCO", "HotpotQA" ], "dataset_id": "sentence-transformers/NanoBEIR-en" }
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.42 |
| cosine_accuracy@3 | 0.64 |
| cosine_accuracy@5 | 0.69 |
| cosine_accuracy@10 | 0.8 |
| cosine_precision@1 | 0.42 |
| cosine_precision@3 | 0.2533 |
| cosine_precision@5 | 0.17 |
| cosine_precision@10 | 0.099 |
| cosine_recall@1 | 0.29 |
| cosine_recall@3 | 0.51 |
| cosine_recall@5 | 0.575 |
| cosine_recall@10 | 0.685 |
| cosine_ndcg@10 | 0.5354 |
| cosine_mrr@10 | 0.5509 |
| cosine_map@100 | 0.468 |
Training Details
Training Dataset
parquet
- Dataset: parquet
- Size: 3,375,201 training samples
- Columns:
textandlabel - Approximate statistics based on the first 1000 samples:
text label type string list details - min: 5 tokens
- mean: 280.41 tokens
- max: 1024 tokens
- size: 384 elements
- Samples:
text label # Scientists Link Diamonds To Earth's Quick Cooling
Scientists say they have evidence the Earth was bombarded by meteors about 13,000 years ago, triggering a 1,000-year cold spell. Researchers write in the journal Science that they have found a layer of microscopic diamonds scattered across North America. An abrupt cooling may have caused many large mammals to become extinct.[4.6171875, 2.515625, 2.439453125, -1.4853515625, -6.328125, ...]# Brad Giffen
Brad Giffen is a retired Canadian news anchor who has worked on television in both Canada and the United States.
Over his broadcasting career he has also worked as a radio personality, disc jockey, VJ, television reporter, television producer and voice-over artist.
## Broadcasting career
Giffen studied at the Poynter Institute for Advanced Journalism Study. In the late 1980s he was a broadcaster on CHUM-FM radio station in Toronto, Ontario, Canada. He previously was John Majhor's successor veejay on CITY-TV's music video program Toronto Rocks. and he hosted the CBC Television battle of the bands competition Rock Wars.
In 1990, Giffen pivoted to news journalism and became a reporter for CFTO's nightly news program World Beat News (later rebranded as CFTO News in early 1998, and CTV News in 2005).
In 1993, Giffen moved to the United States and became co-anchor of the nightly news on the Fox affiliate KSTU, in Salt Lake City, Utah. Giffen left that post in 1995 to accept ...[-1.693359375, 13.3828125, 4.50390625, 0.41064453125, -2.884765625, ...]# How Trump Won, According To The Exit Polls
Donald Trump will be the next president of the United States. That's remarkable for all sorts of reasons: He has no governmental experience, for example. And many times during his campaign, Trump's words inflamed large swaths of Americans, whether it was his comments from years ago talking about grabbing women's genitals or calling Mexican immigrants in the U.S. illegally "rapists" and playing up crimes committed by immigrants, including drug crimes and murders. But right now, it's also remarkable because almost no one saw it coming. All major forecasters predicted a Hillary Clinton win, whether moderately or by a landslide. So what happened? We don't know just yet why pollsters and forecasters got it wrong, but here's what made this electorate so different from the one that elected Barack Obama by 4 points in 2012. To be clear, it's impossible to break any election results out into fully discrete demographic groups or trends — race, gend...[3.4296875, 12.828125, 2.8203125, -5.47265625, -5.390625, ...] - Loss:
MSELoss
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 64per_device_eval_batch_size: 64learning_rate: 0.0001num_train_epochs: 2warmup_steps: 0.1fp16: Trueload_best_model_at_end: True
All Hyperparameters
Click to expand
do_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 0.0001weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: Nonewarmup_steps: 0.1log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Falsefp16: Truebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
Click to expand
| Epoch | Step | Training Loss | mse-dev_negative_mse | NanoMSMARCO_cosine_ndcg@10 | NanoHotpotQA_cosine_ndcg@10 | NanoBEIR_mean_cosine_ndcg@10 |
|---|---|---|---|---|---|---|
| 0.0019 | 100 | 4.2698 | - | - | - | - |
| 0.0038 | 200 | 4.2304 | - | - | - | - |
| 0.0057 | 300 | 4.1280 | - | - | - | - |
| 0.0076 | 400 | 3.8576 | - | - | - | - |
| 0.0095 | 500 | 3.1561 | - | - | - | - |
| 0.0114 | 600 | 2.5527 | - | - | - | - |
| 0.0133 | 700 | 2.3275 | - | - | - | - |
| 0.0152 | 800 | 2.2656 | - | - | - | - |
| 0.0171 | 900 | 2.2401 | - | - | - | - |
| 0.0190 | 1000 | 2.2256 | -221.2144 | 0.0514 | 0.0577 | 0.0545 |
| 0.0209 | 1100 | 2.2140 | - | - | - | - |
| 0.0228 | 1200 | 2.1920 | - | - | - | - |
| 0.0247 | 1300 | 2.1840 | - | - | - | - |
| 0.0265 | 1400 | 2.1662 | - | - | - | - |
| 0.0284 | 1500 | 2.1598 | - | - | - | - |
| 0.0303 | 1600 | 2.1452 | - | - | - | - |
| 0.0322 | 1700 | 2.1226 | - | - | - | - |
| 0.0341 | 1800 | 2.1068 | - | - | - | - |
| 0.0360 | 1900 | 2.0941 | - | - | - | - |
| 0.0379 | 2000 | 2.0796 | -206.8865 | 0.1481 | 0.0672 | 0.1077 |
| 0.0398 | 2100 | 2.0621 | - | - | - | - |
| 0.0417 | 2200 | 2.0545 | - | - | - | - |
| 0.0436 | 2300 | 2.0382 | - | - | - | - |
| 0.0455 | 2400 | 2.0267 | - | - | - | - |
| 0.0474 | 2500 | 2.0167 | - | - | - | - |
| 0.0493 | 2600 | 2.0041 | - | - | - | - |
| 0.0512 | 2700 | 1.9902 | - | - | - | - |
| 0.0531 | 2800 | 1.9746 | - | - | - | - |
| 0.0550 | 2900 | 1.9650 | - | - | - | - |
| 0.0569 | 3000 | 1.9539 | -194.5440 | 0.1243 | 0.1242 | 0.1243 |
| 0.0588 | 3100 | 1.9401 | - | - | - | - |
| 0.0607 | 3200 | 1.9317 | - | - | - | - |
| 0.0626 | 3300 | 1.9181 | - | - | - | - |
| 0.0645 | 3400 | 1.9098 | - | - | - | - |
| 0.0664 | 3500 | 1.8983 | - | - | - | - |
| 0.0683 | 3600 | 1.8924 | - | - | - | - |
| 0.0702 | 3700 | 1.8806 | - | - | - | - |
| 0.0721 | 3800 | 1.8717 | - | - | - | - |
| 0.0740 | 3900 | 1.8591 | - | - | - | - |
| 0.0758 | 4000 | 1.8525 | -184.2026 | 0.1647 | 0.1745 | 0.1696 |
| 0.0777 | 4100 | 1.8416 | - | - | - | - |
| 0.0796 | 4200 | 1.8359 | - | - | - | - |
| 0.0815 | 4300 | 1.8256 | - | - | - | - |
| 0.0834 | 4400 | 1.8131 | - | - | - | - |
| 0.0853 | 4500 | 1.8063 | - | - | - | - |
| 0.0872 | 4600 | 1.7950 | - | - | - | - |
| 0.0891 | 4700 | 1.7846 | - | - | - | - |
| 0.0910 | 4800 | 1.7762 | - | - | - | - |
| 0.0929 | 4900 | 1.7620 | - | - | - | - |
| 0.0948 | 5000 | 1.7605 | -175.1685 | 0.1960 | 0.2024 | 0.1992 |
| 0.0967 | 5100 | 1.7481 | - | - | - | - |
| 0.0986 | 5200 | 1.7419 | - | - | - | - |
| 0.1005 | 5300 | 1.7301 | - | - | - | - |
| 0.1024 | 5400 | 1.7280 | - | - | - | - |
| 0.1043 | 5500 | 1.7131 | - | - | - | - |
| 0.1062 | 5600 | 1.7063 | - | - | - | - |
| 0.1081 | 5700 | 1.6959 | - | - | - | - |
| 0.1100 | 5800 | 1.6884 | - | - | - | - |
| 0.1119 | 5900 | 1.6801 | - | - | - | - |
| 0.1138 | 6000 | 1.6700 | -166.4924 | 0.2493 | 0.2150 | 0.2321 |
| 0.1157 | 6100 | 1.6637 | - | - | - | - |
| 0.1176 | 6200 | 1.6543 | - | - | - | - |
| 0.1195 | 6300 | 1.6451 | - | - | - | - |
| 0.1214 | 6400 | 1.6382 | - | - | - | - |
| 0.1233 | 6500 | 1.6278 | - | - | - | - |
| 0.1251 | 6600 | 1.6235 | - | - | - | - |
| 0.1270 | 6700 | 1.6150 | - | - | - | - |
| 0.1289 | 6800 | 1.6054 | - | - | - | - |
| 0.1308 | 6900 | 1.6007 | - | - | - | - |
| 0.1327 | 7000 | 1.5874 | -158.1013 | 0.2809 | 0.2349 | 0.2579 |
| 0.1346 | 7100 | 1.5824 | - | - | - | - |
| 0.1365 | 7200 | 1.5724 | - | - | - | - |
| 0.1384 | 7300 | 1.5669 | - | - | - | - |
| 0.1403 | 7400 | 1.5535 | - | - | - | - |
| 0.1422 | 7500 | 1.5450 | - | - | - | - |
| 0.1441 | 7600 | 1.5345 | - | - | - | - |
| 0.1460 | 7700 | 1.5340 | - | - | - | - |
| 0.1479 | 7800 | 1.5242 | - | - | - | - |
| 0.1498 | 7900 | 1.5181 | - | - | - | - |
| 0.1517 | 8000 | 1.5086 | -150.1032 | 0.2957 | 0.2454 | 0.2705 |
| 0.1536 | 8100 | 1.5007 | - | - | - | - |
| 0.1555 | 8200 | 1.4950 | - | - | - | - |
| 0.1574 | 8300 | 1.4829 | - | - | - | - |
| 0.1593 | 8400 | 1.4780 | - | - | - | - |
| 0.1612 | 8500 | 1.4737 | - | - | - | - |
| 0.1631 | 8600 | 1.4603 | - | - | - | - |
| 0.1650 | 8700 | 1.4510 | - | - | - | - |
| 0.1669 | 8800 | 1.4500 | - | - | - | - |
| 0.1688 | 8900 | 1.4408 | - | - | - | - |
| 0.1707 | 9000 | 1.4372 | -142.8462 | 0.3033 | 0.2824 | 0.2929 |
| 0.1726 | 9100 | 1.4270 | - | - | - | - |
| 0.1744 | 9200 | 1.4233 | - | - | - | - |
| 0.1763 | 9300 | 1.4135 | - | - | - | - |
| 0.1782 | 9400 | 1.4074 | - | - | - | - |
| 0.1801 | 9500 | 1.3981 | - | - | - | - |
| 0.1820 | 9600 | 1.3919 | - | - | - | - |
| 0.1839 | 9700 | 1.3844 | - | - | - | - |
| 0.1858 | 9800 | 1.3741 | - | - | - | - |
| 0.1877 | 9900 | 1.3685 | - | - | - | - |
| 0.1896 | 10000 | 1.3668 | -135.7081 | 0.3194 | 0.3059 | 0.3127 |
| 0.1915 | 10100 | 1.3568 | - | - | - | - |
| 0.1934 | 10200 | 1.3505 | - | - | - | - |
| 0.1953 | 10300 | 1.3433 | - | - | - | - |
| 0.1972 | 10400 | 1.3338 | - | - | - | - |
| 0.1991 | 10500 | 1.3295 | - | - | - | - |
| 0.2010 | 10600 | 1.3275 | - | - | - | - |
| 0.2029 | 10700 | 1.3149 | - | - | - | - |
| 0.2048 | 10800 | 1.3119 | - | - | - | - |
| 0.2067 | 10900 | 1.3055 | - | - | - | - |
| 0.2086 | 11000 | 1.2952 | -129.2064 | 0.3109 | 0.3434 | 0.3272 |
| 0.2105 | 11100 | 1.2920 | - | - | - | - |
| 0.2124 | 11200 | 1.2851 | - | - | - | - |
| 0.2143 | 11300 | 1.2769 | - | - | - | - |
| 0.2162 | 11400 | 1.2747 | - | - | - | - |
| 0.2181 | 11500 | 1.2686 | - | - | - | - |
| 0.2200 | 11600 | 1.2684 | - | - | - | - |
| 0.2219 | 11700 | 1.2582 | - | - | - | - |
| 0.2237 | 11800 | 1.2582 | - | - | - | - |
| 0.2256 | 11900 | 1.2479 | - | - | - | - |
| 0.2275 | 12000 | 1.2418 | -123.6261 | 0.3439 | 0.3547 | 0.3493 |
| 0.2294 | 12100 | 1.2400 | - | - | - | - |
| 0.2313 | 12200 | 1.2330 | - | - | - | - |
| 0.2332 | 12300 | 1.2288 | - | - | - | - |
| 0.2351 | 12400 | 1.2230 | - | - | - | - |
| 0.2370 | 12500 | 1.2164 | - | - | - | - |
| 0.2389 | 12600 | 1.2157 | - | - | - | - |
| 0.2408 | 12700 | 1.2166 | - | - | - | - |
| 0.2427 | 12800 | 1.2045 | - | - | - | - |
| 0.2446 | 12900 | 1.2035 | - | - | - | - |
| 0.2465 | 13000 | 1.1968 | -118.8691 | 0.3282 | 0.3329 | 0.3306 |
| 0.2484 | 13100 | 1.1942 | - | - | - | - |
| 0.2503 | 13200 | 1.1895 | - | - | - | - |
| 0.2522 | 13300 | 1.1843 | - | - | - | - |
| 0.2541 | 13400 | 1.1755 | - | - | - | - |
| 0.2560 | 13500 | 1.1756 | - | - | - | - |
| 0.2579 | 13600 | 1.1707 | - | - | - | - |
| 0.2598 | 13700 | 1.1637 | - | - | - | - |
| 0.2617 | 13800 | 1.1684 | - | - | - | - |
| 0.2636 | 13900 | 1.1628 | - | - | - | - |
| 0.2655 | 14000 | 1.1585 | -115.4122 | 0.3779 | 0.3579 | 0.3679 |
| 0.2674 | 14100 | 1.1602 | - | - | - | - |
| 0.2693 | 14200 | 1.1504 | - | - | - | - |
| 0.2712 | 14300 | 1.1483 | - | - | - | - |
| 0.2730 | 14400 | 1.1488 | - | - | - | - |
| 0.2749 | 14500 | 1.1392 | - | - | - | - |
| 0.2768 | 14600 | 1.1343 | - | - | - | - |
| 0.2787 | 14700 | 1.1363 | - | - | - | - |
| 0.2806 | 14800 | 1.1342 | - | - | - | - |
| 0.2825 | 14900 | 1.1327 | - | - | - | - |
| 0.2844 | 15000 | 1.1219 | -111.9139 | 0.3794 | 0.3791 | 0.3793 |
| 0.2863 | 15100 | 1.1246 | - | - | - | - |
| 0.2882 | 15200 | 1.1152 | - | - | - | - |
| 0.2901 | 15300 | 1.1196 | - | - | - | - |
| 0.2920 | 15400 | 1.1097 | - | - | - | - |
| 0.2939 | 15500 | 1.1067 | - | - | - | - |
| 0.2958 | 15600 | 1.0994 | - | - | - | - |
| 0.2977 | 15700 | 1.1077 | - | - | - | - |
| 0.2996 | 15800 | 1.1057 | - | - | - | - |
| 0.3015 | 15900 | 1.0949 | - | - | - | - |
| 0.3034 | 16000 | 1.0981 | -109.2994 | 0.3867 | 0.3855 | 0.3861 |
| 0.3053 | 16100 | 1.0933 | - | - | - | - |
| 0.3072 | 16200 | 1.0873 | - | - | - | - |
| 0.3091 | 16300 | 1.0851 | - | - | - | - |
| 0.3110 | 16400 | 1.0840 | - | - | - | - |
| 0.3129 | 16500 | 1.0831 | - | - | - | - |
| 0.3148 | 16600 | 1.0755 | - | - | - | - |
| 0.3167 | 16700 | 1.0733 | - | - | - | - |
| 0.3186 | 16800 | 1.0724 | - | - | - | - |
| 0.3205 | 16900 | 1.0698 | - | - | - | - |
| 0.3223 | 17000 | 1.0710 | -106.3769 | 0.4092 | 0.4066 | 0.4079 |
| 0.3242 | 17100 | 1.0699 | - | - | - | - |
| 0.3261 | 17200 | 1.0642 | - | - | - | - |
| 0.3280 | 17300 | 1.0576 | - | - | - | - |
| 0.3299 | 17400 | 1.0597 | - | - | - | - |
| 0.3318 | 17500 | 1.0572 | - | - | - | - |
| 0.3337 | 17600 | 1.0547 | - | - | - | - |
| 0.3356 | 17700 | 1.0502 | - | - | - | - |
| 0.3375 | 17800 | 1.0467 | - | - | - | - |
| 0.3394 | 17900 | 1.0485 | - | - | - | - |
| 0.3413 | 18000 | 1.0455 | -103.7698 | 0.4510 | 0.4237 | 0.4374 |
| 0.3432 | 18100 | 1.0433 | - | - | - | - |
| 0.3451 | 18200 | 1.0404 | - | - | - | - |
| 0.3470 | 18300 | 1.0397 | - | - | - | - |
| 0.3489 | 18400 | 1.0352 | - | - | - | - |
| 0.3508 | 18500 | 1.0318 | - | - | - | - |
| 0.3527 | 18600 | 1.0302 | - | - | - | - |
| 0.3546 | 18700 | 1.0330 | - | - | - | - |
| 0.3565 | 18800 | 1.0220 | - | - | - | - |
| 0.3584 | 18900 | 1.0223 | - | - | - | - |
| 0.3603 | 19000 | 1.0254 | -101.5743 | 0.4439 | 0.4265 | 0.4352 |
| 0.3622 | 19100 | 1.0186 | - | - | - | - |
| 0.3641 | 19200 | 1.0216 | - | - | - | - |
| 0.3660 | 19300 | 1.0152 | - | - | - | - |
| 0.3679 | 19400 | 1.0139 | - | - | - | - |
| 0.3698 | 19500 | 1.0125 | - | - | - | - |
| 0.3716 | 19600 | 1.0087 | - | - | - | - |
| 0.3735 | 19700 | 1.0045 | - | - | - | - |
| 0.3754 | 19800 | 1.0032 | - | - | - | - |
| 0.3773 | 19900 | 1.0013 | - | - | - | - |
| 0.3792 | 20000 | 1.0017 | -99.6613 | 0.4554 | 0.4374 | 0.4464 |
| 0.3811 | 20100 | 1.0007 | - | - | - | - |
| 0.3830 | 20200 | 0.9959 | - | - | - | - |
| 0.3849 | 20300 | 0.9965 | - | - | - | - |
| 0.3868 | 20400 | 0.9909 | - | - | - | - |
| 0.3887 | 20500 | 0.9902 | - | - | - | - |
| 0.3906 | 20600 | 0.9903 | - | - | - | - |
| 0.3925 | 20700 | 0.9927 | - | - | - | - |
| 0.3944 | 20800 | 0.9865 | - | - | - | - |
| 0.3963 | 20900 | 0.9843 | - | - | - | - |
| 0.3982 | 21000 | 0.9809 | -97.4922 | 0.4689 | 0.4462 | 0.4575 |
| 0.4001 | 21100 | 0.9801 | - | - | - | - |
| 0.4020 | 21200 | 0.9785 | - | - | - | - |
| 0.4039 | 21300 | 0.9718 | - | - | - | - |
| 0.4058 | 21400 | 0.9725 | - | - | - | - |
| 0.4077 | 21500 | 0.9705 | - | - | - | - |
| 0.4096 | 21600 | 0.9729 | - | - | - | - |
| 0.4115 | 21700 | 0.9714 | - | - | - | - |
| 0.4134 | 21800 | 0.9647 | - | - | - | - |
| 0.4153 | 21900 | 0.9623 | - | - | - | - |
| 0.4172 | 22000 | 0.9579 | -95.7813 | 0.4642 | 0.4549 | 0.4595 |
| 0.4191 | 22100 | 0.9553 | - | - | - | - |
| 0.4209 | 22200 | 0.9558 | - | - | - | - |
| 0.4228 | 22300 | 0.9584 | - | - | - | - |
| 0.4247 | 22400 | 0.9544 | - | - | - | - |
| 0.4266 | 22500 | 0.9520 | - | - | - | - |
| 0.4285 | 22600 | 0.9516 | - | - | - | - |
| 0.4304 | 22700 | 0.9543 | - | - | - | - |
| 0.4323 | 22800 | 0.9502 | - | - | - | - |
| 0.4342 | 22900 | 0.9477 | - | - | - | - |
| 0.4361 | 23000 | 0.9405 | -93.9238 | 0.4856 | 0.4521 | 0.4688 |
| 0.4380 | 23100 | 0.9448 | - | - | - | - |
| 0.4399 | 23200 | 0.9424 | - | - | - | - |
| 0.4418 | 23300 | 0.9369 | - | - | - | - |
| 0.4437 | 23400 | 0.9318 | - | - | - | - |
| 0.4456 | 23500 | 0.9342 | - | - | - | - |
| 0.4475 | 23600 | 0.9392 | - | - | - | - |
| 0.4494 | 23700 | 0.9358 | - | - | - | - |
| 0.4513 | 23800 | 0.9303 | - | - | - | - |
| 0.4532 | 23900 | 0.9306 | - | - | - | - |
| 0.4551 | 24000 | 0.9277 | -92.2427 | 0.4946 | 0.4798 | 0.4872 |
| 0.4570 | 24100 | 0.9267 | - | - | - | - |
| 0.4589 | 24200 | 0.9228 | - | - | - | - |
| 0.4608 | 24300 | 0.9239 | - | - | - | - |
| 0.4627 | 24400 | 0.9225 | - | - | - | - |
| 0.4646 | 24500 | 0.9169 | - | - | - | - |
| 0.4665 | 24600 | 0.9170 | - | - | - | - |
| 0.4684 | 24700 | 0.9195 | - | - | - | - |
| 0.4702 | 24800 | 0.9153 | - | - | - | - |
| 0.4721 | 24900 | 0.9138 | - | - | - | - |
| 0.4740 | 25000 | 0.9108 | -90.7635 | 0.4622 | 0.4812 | 0.4717 |
| 0.4759 | 25100 | 0.9133 | - | - | - | - |
| 0.4778 | 25200 | 0.9076 | - | - | - | - |
| 0.4797 | 25300 | 0.9081 | - | - | - | - |
| 0.4816 | 25400 | 0.9093 | - | - | - | - |
| 0.4835 | 25500 | 0.9037 | - | - | - | - |
| 0.4854 | 25600 | 0.9025 | - | - | - | - |
| 0.4873 | 25700 | 0.9058 | - | - | - | - |
| 0.4892 | 25800 | 0.9018 | - | - | - | - |
| 0.4911 | 25900 | 0.9014 | - | - | - | - |
| 0.4930 | 26000 | 0.8946 | -89.2562 | 0.4745 | 0.4957 | 0.4851 |
| 0.4949 | 26100 | 0.8982 | - | - | - | - |
| 0.4968 | 26200 | 0.8946 | - | - | - | - |
| 0.4987 | 26300 | 0.8941 | - | - | - | - |
| 0.5006 | 26400 | 0.8925 | - | - | - | - |
| 0.5025 | 26500 | 0.8947 | - | - | - | - |
| 0.5044 | 26600 | 0.8906 | - | - | - | - |
| 0.5063 | 26700 | 0.8895 | - | - | - | - |
| 0.5082 | 26800 | 0.8866 | - | - | - | - |
| 0.5101 | 26900 | 0.8840 | - | - | - | - |
| 0.5120 | 27000 | 0.8764 | -87.8039 | 0.5011 | 0.5173 | 0.5092 |
| 0.5139 | 27100 | 0.8859 | - | - | - | - |
| 0.5158 | 27200 | 0.8839 | - | - | - | - |
| 0.5177 | 27300 | 0.8794 | - | - | - | - |
| 0.5195 | 27400 | 0.8790 | - | - | - | - |
| 0.5214 | 27500 | 0.8788 | - | - | - | - |
| 0.5233 | 27600 | 0.8780 | - | - | - | - |
| 0.5252 | 27700 | 0.8749 | - | - | - | - |
| 0.5271 | 27800 | 0.8742 | - | - | - | - |
| 0.5290 | 27900 | 0.8700 | - | - | - | - |
| 0.5309 | 28000 | 0.8691 | -86.4419 | 0.4936 | 0.4776 | 0.4856 |
| 0.5328 | 28100 | 0.8747 | - | - | - | - |
| 0.5347 | 28200 | 0.8644 | - | - | - | - |
| 0.5366 | 28300 | 0.8673 | - | - | - | - |
| 0.5385 | 28400 | 0.8670 | - | - | - | - |
| 0.5404 | 28500 | 0.8638 | - | - | - | - |
| 0.5423 | 28600 | 0.8649 | - | - | - | - |
| 0.5442 | 28700 | 0.8629 | - | - | - | - |
| 0.5461 | 28800 | 0.8629 | - | - | - | - |
| 0.5480 | 28900 | 0.8591 | - | - | - | - |
| 0.5499 | 29000 | 0.8566 | -85.0408 | 0.4792 | 0.4918 | 0.4855 |
| 0.5518 | 29100 | 0.8588 | - | - | - | - |
| 0.5537 | 29200 | 0.8545 | - | - | - | - |
| 0.5556 | 29300 | 0.8534 | - | - | - | - |
| 0.5575 | 29400 | 0.8543 | - | - | - | - |
| 0.5594 | 29500 | 0.8534 | - | - | - | - |
| 0.5613 | 29600 | 0.8519 | - | - | - | - |
| 0.5632 | 29700 | 0.8486 | - | - | - | - |
| 0.5651 | 29800 | 0.8530 | - | - | - | - |
| 0.5670 | 29900 | 0.8477 | - | - | - | - |
| 0.5688 | 30000 | 0.8465 | -83.9435 | 0.4986 | 0.5097 | 0.5042 |
| 0.5707 | 30100 | 0.8425 | - | - | - | - |
| 0.5726 | 30200 | 0.8437 | - | - | - | - |
| 0.5745 | 30300 | 0.8430 | - | - | - | - |
| 0.5764 | 30400 | 0.8431 | - | - | - | - |
| 0.5783 | 30500 | 0.8424 | - | - | - | - |
| 0.5802 | 30600 | 0.8403 | - | - | - | - |
| 0.5821 | 30700 | 0.8347 | - | - | - | - |
| 0.5840 | 30800 | 0.8344 | - | - | - | - |
| 0.5859 | 30900 | 0.8348 | - | - | - | - |
| 0.5878 | 31000 | 0.8351 | -82.8113 | 0.4999 | 0.5088 | 0.5043 |
| 0.5897 | 31100 | 0.8362 | - | - | - | - |
| 0.5916 | 31200 | 0.8307 | - | - | - | - |
| 0.5935 | 31300 | 0.8315 | - | - | - | - |
| 0.5954 | 31400 | 0.8311 | - | - | - | - |
| 0.5973 | 31500 | 0.8305 | - | - | - | - |
| 0.5992 | 31600 | 0.8304 | - | - | - | - |
| 0.6011 | 31700 | 0.8277 | - | - | - | - |
| 0.6030 | 31800 | 0.8249 | - | - | - | - |
| 0.6049 | 31900 | 0.8262 | - | - | - | - |
| 0.6068 | 32000 | 0.8236 | -81.7389 | 0.4811 | 0.5256 | 0.5034 |
| 0.6087 | 32100 | 0.8209 | - | - | - | - |
| 0.6106 | 32200 | 0.8226 | - | - | - | - |
| 0.6125 | 32300 | 0.8207 | - | - | - | - |
| 0.6144 | 32400 | 0.8224 | - | - | - | - |
| 0.6163 | 32500 | 0.8163 | - | - | - | - |
| 0.6182 | 32600 | 0.8181 | - | - | - | - |
| 0.6200 | 32700 | 0.8147 | - | - | - | - |
| 0.6219 | 32800 | 0.8170 | - | - | - | - |
| 0.6238 | 32900 | 0.8156 | - | - | - | - |
| 0.6257 | 33000 | 0.8141 | -80.4979 | 0.5042 | 0.5085 | 0.5064 |
| 0.6276 | 33100 | 0.8088 | - | - | - | - |
| 0.6295 | 33200 | 0.8098 | - | - | - | - |
| 0.6314 | 33300 | 0.8133 | - | - | - | - |
| 0.6333 | 33400 | 0.8087 | - | - | - | - |
| 0.6352 | 33500 | 0.8086 | - | - | - | - |
| 0.6371 | 33600 | 0.8094 | - | - | - | - |
| 0.6390 | 33700 | 0.8054 | - | - | - | - |
| 0.6409 | 33800 | 0.8043 | - | - | - | - |
| 0.6428 | 33900 | 0.8035 | - | - | - | - |
| 0.6447 | 34000 | 0.7990 | -79.5726 | 0.4990 | 0.5166 | 0.5078 |
| 0.6466 | 34100 | 0.8035 | - | - | - | - |
| 0.6485 | 34200 | 0.7990 | - | - | - | - |
| 0.6504 | 34300 | 0.7996 | - | - | - | - |
| 0.6523 | 34400 | 0.8005 | - | - | - | - |
| 0.6542 | 34500 | 0.8000 | - | - | - | - |
| 0.6561 | 34600 | 0.7975 | - | - | - | - |
| 0.6580 | 34700 | 0.7959 | - | - | - | - |
| 0.6599 | 34800 | 0.7921 | - | - | - | - |
| 0.6618 | 34900 | 0.7916 | - | - | - | - |
| 0.6637 | 35000 | 0.7933 | -78.7884 | 0.5104 | 0.5139 | 0.5122 |
| 0.6656 | 35100 | 0.7908 | - | - | - | - |
| 0.6675 | 35200 | 0.7913 | - | - | - | - |
| 0.6693 | 35300 | 0.7921 | - | - | - | - |
| 0.6712 | 35400 | 0.7929 | - | - | - | - |
| 0.6731 | 35500 | 0.7915 | - | - | - | - |
| 0.6750 | 35600 | 0.7871 | - | - | - | - |
| 0.6769 | 35700 | 0.7836 | - | - | - | - |
| 0.6788 | 35800 | 0.7805 | - | - | - | - |
| 0.6807 | 35900 | 0.7870 | - | - | - | - |
| 0.6826 | 36000 | 0.7797 | -77.7400 | 0.5251 | 0.5457 | 0.5354 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 5.2.2
- Transformers: 5.1.0
- PyTorch: 2.7.1+cu128
- Accelerate: 1.9.0
- Datasets: 4.0.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MSELoss
@inproceedings{reimers-2020-multilingual-sentence-bert,
title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2020",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2004.09813",
}
ModernBERT Model Architecture
@misc{warner2024smarterbetterfasterlonger,
title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
year={2024},
eprint={2412.13663},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13663},
}
Model Weight Initialization
@misc{trinh2025guideguidedinitializationdistillation,
title={GUIDE: Guided Initialization and Distillation of Embeddings},
author={Khoa Trinh and Gaurav Menghani and Erik Vee},
year={2025},
eprint={2510.06502},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.06502},
}
- Downloads last month
- 47
Papers for johnnyboycurtis/ModernBERT-small-v2
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Evaluation results
- Negative Mse on mse devself-reported-77.740
- Cosine Accuracy@1 on NanoMSMARCOself-reported0.320
- Cosine Accuracy@3 on NanoMSMARCOself-reported0.520
- Cosine Accuracy@5 on NanoMSMARCOself-reported0.600
- Cosine Accuracy@10 on NanoMSMARCOself-reported0.760
- Cosine Precision@1 on NanoMSMARCOself-reported0.320
- Cosine Precision@3 on NanoMSMARCOself-reported0.173
- Cosine Precision@5 on NanoMSMARCOself-reported0.120
- Cosine Precision@10 on NanoMSMARCOself-reported0.076
- Cosine Recall@1 on NanoMSMARCOself-reported0.320
- Cosine Recall@3 on NanoMSMARCOself-reported0.520
- Cosine Recall@5 on NanoMSMARCOself-reported0.600
- Cosine Recall@10 on NanoMSMARCOself-reported0.760
- Cosine Ndcg@10 on NanoMSMARCOself-reported0.525
- Cosine Mrr@10 on NanoMSMARCOself-reported0.452
- Cosine Map@100 on NanoMSMARCOself-reported0.462
- Cosine Accuracy@1 on NanoHotpotQAself-reported0.520
- Cosine Accuracy@3 on NanoHotpotQAself-reported0.760
- Cosine Accuracy@5 on NanoHotpotQAself-reported0.780
- Cosine Accuracy@10 on NanoHotpotQAself-reported0.840