Nice work & Recommendation
Hello!
I've just read through your model card, and this looks very impressive. I'm always glad to see low-resource languages get dedicated models!
I also have a recommendation: with modern sentence-transformers versions, you can use model.encode_query and model.encode_document, which automatically use the "query" and "document" prompt names. This means that users can do this instead:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("LocalDoc/LocRet-small")
queries = ["Azərbaycanın paytaxtı hansı şəhərdir?"]
passages = [
"Bakı Azərbaycan Respublikasının paytaxtı və ən böyük şəhəridir.",
"Gəncə Azərbaycanın ikinci böyük şəhəridir.",
]
query_embeddings = model.encode_query(queries)
passage_embeddings = model.encode_document(passages)
similarities = model.similarity(query_embeddings, passage_embeddings)
print(similarities)
Once you update these lines (https://huggingface.co/LocalDoc/LocRet-small/blob/main/config_sentence_transformers.json#L8-L11) to:
"prompts": {
"query": "query: ",
"document": "passage: "
},
Optionally, you can also set default_prompt_name to "document", which means that "passage: " will always be prepended if the user uses model.encode without a prompt or prompt_name. There's some more details here: https://sbert.net/examples/sentence_transformer/applications/computing-embeddings/README.html#prompt-templates
- Tom Aarsen
Hello!
I've just read through your model card, and this looks very impressive. I'm always glad to see low-resource languages get dedicated models!
I also have a recommendation: with modern sentence-transformers versions, you can use
model.encode_queryandmodel.encode_document, which automatically use the"query"and "document" prompt names. This means that users can do this instead:from sentence_transformers import SentenceTransformer model = SentenceTransformer("LocalDoc/LocRet-small") queries = ["Azərbaycanın paytaxtı hansı şəhərdir?"] passages = [ "Bakı Azərbaycan Respublikasının paytaxtı və ən böyük şəhəridir.", "Gəncə Azərbaycanın ikinci böyük şəhəridir.", ] query_embeddings = model.encode_query(queries) passage_embeddings = model.encode_document(passages) similarities = model.similarity(query_embeddings, passage_embeddings) print(similarities)Once you update these lines (https://huggingface.co/LocalDoc/LocRet-small/blob/main/config_sentence_transformers.json#L8-L11) to:
"prompts": { "query": "query: ", "document": "passage: " },Optionally, you can also set
default_prompt_nameto"document", which means that"passage: "will always be prepended if the user usesmodel.encodewithout apromptorprompt_name. There's some more details here: https://sbert.net/examples/sentence_transformer/applications/computing-embeddings/README.html#prompt-templates
- Tom Aarsen
Hi Tom
Thank you for the kind words and the excellent recommendation
I've updated config_sentence_transformers.json with the query and document prompt names, and set default_prompt_name to "document".
The model card usage example is also updated to use encode_query / encode_document.