Add Sentence Transformers integration
#1
by
tomaarsen
HF Staff
- opened
Hello!
Congratulations on your release! I wanted to experiment, so I worked on a simple Sentence Transformers integration.
Pull Request overview
- Add a Sentence Transformers module that relies on
luxical, akin to thetransformersintegration - Update the README so it's clear that luxical, transformers, and sentence-transformers all work.
Details
You should be able to run this after a simple pip install sentence-transformers (the revision argument will pull straight from this PR branch, etc.), :
from sentence_transformers import SentenceTransformer
example_text = "Luxical integrates with Huggingface."
luxical_one = SentenceTransformer("DatologyAI/luxical-one", revision="refs/pr/1", trust_remote_code=True)
print(luxical_one)
embeddings = luxical_one.encode(example_text)
print(embeddings[,:5])
# tensor([-0.0061, 0.0410, -0.0388, -0.0276, 0.0245])
Some longer tests:
import torch
from sentence_transformers import SentenceTransformer
luxical_one = SentenceTransformer("DatologyAI/luxical-one", revision="refs/pr/1", trust_remote_code=True)
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = luxical_one.encode(sentences)
embeddings = torch.tensor(embeddings)
print(embeddings.shape)
# torch.Size([3, 192])
similarities = embeddings @ embeddings.T
print(similarities)
'''
tensor([[1.0000, 0.8420, 0.5579],
[0.8420, 1.0000, 0.5876],
[0.5579, 0.5876, 1.0000]])
'''
luxical_one.save_pretrained("tmp")
model_fresh = SentenceTransformer("tmp", trust_remote_code=True)
print(model_fresh)
embeddings_fresh = model_fresh.encode(sentences)
embeddings_fresh = torch.tensor(embeddings_fresh)
print(torch.allclose(embeddings, embeddings_fresh))
Which also shows that the model saves and loads correctly again.
This should give you the opportunity to integrate directly with third parties of Sentence Transformers, like MTEB, LangChain, LlamaIndex, Haystack, Txtai, etc., as well as the Sentence Transformers evaluation, specifically the NanoBEIREvaluator should be interesting.
- Tom Aarsen
tomaarsen
changed pull request status to
open
lukemerrick
changed pull request status to
merged