A newer version of this model is available:
tokinasin/ruri-v3-70m-code-v0.2
This model is a fine-tuned version of cl-nagoya/ruri-v3-70m for retrieving semantically segmented code snippets using natural language queries.
Supported Natural Languages
Japanese, English
Supported Programming Languages
C, CSharp, Cpp, Go, Java, JavaScript, PHP, Python, Ruby, Rust, SQL, Bash, Swift, TypeScript
Example Usage
Please refer to the original model for more detail: https://huggingface.co/cl-nagoya/ruri-v3-70m
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("tokinasin/ruri-v3-70m-code-v0.1")
code_snippets = [
"""def fibonacci(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return a""",
"""import numpy as np
def normalize(v):
norm = np.linalg.norm(v)
return v / norm if norm != 0 else v""",
"""def is_prime(num):
if num < 2:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True"""
]
descriptions = [
"ใใฃใใใใๆฐๅใ่จ็ฎใใ้ขๆฐ",
"ใใฏใใซใๆญฃ่ฆๅใใใฆใผใใฃใชใใฃ",
"ๆดๆฐใ็ด ๆฐใใฉใใๅคๅฎใใ้ขๆฐ"
]
# Encode
code_embeddings = model.encode(code_snippets, normalize_embeddings=True)
desc_embeddings = model.encode(descriptions, normalize_embeddings=True)
# Calculate similarities
similarities = util.cos_sim(desc_embeddings, code_embeddings)
# Print results
print("\nSimilarity Matrix (Description โ Code):")
print("="*60)
print(f"{'Description':<40} | Code #1 Code #2 Code #3")
print("-"*60)
for i, desc in enumerate(descriptions):
scores = [f"{similarities[i][j]:.4f}" for j in range(3)]
best_match = similarities[i].argmax().item() + 1
print(f"{desc[:37]:<40} | {scores[0]} {scores[1]} {scores[2]} โ Best: #{best_match}")
print("="*60)
# Check accuracy
correct = 0
for i in range(len(descriptions)):
if similarities[i].argmax().item() == i:
correct += 1
accuracy = correct / len(descriptions)
print(f"\nAccuracy@1: {accuracy:.2%} ({correct}/{len(descriptions)})")
Acknowledgements
This model is based on cl-nagoya/ruri-v3-70m. Many thanks to those who developed the excellent original model.
- Downloads last month
- 3
Model tree for tokinasin/ruri-v3-70m-code-v0.1
Base model
sbintuitions/modernbert-ja-70m
Finetuned
cl-nagoya/ruri-v3-pt-70m
Finetuned
cl-nagoya/ruri-v3-70m