Add Sentence Transformers integration

#1
by tomaarsen HF Staff - opened

Hello!

Congratulations on your release! I wanted to experiment, so I worked on a simple Sentence Transformers integration.

Pull Request overview

  • Add a Sentence Transformers module that relies on luxical, akin to the transformers integration
  • Update the README so it's clear that luxical, transformers, and sentence-transformers all work.

Details

You should be able to run this after a simple pip install sentence-transformers (the revision argument will pull straight from this PR branch, etc.), :

from sentence_transformers import SentenceTransformer

example_text = "Luxical integrates with Huggingface."
luxical_one = SentenceTransformer("DatologyAI/luxical-one", revision="refs/pr/1", trust_remote_code=True)
print(luxical_one)

embeddings = luxical_one.encode(example_text)
print(embeddings[,:5])
# tensor([-0.0061,  0.0410, -0.0388, -0.0276,  0.0245])

Some longer tests:

import torch
from sentence_transformers import SentenceTransformer

luxical_one = SentenceTransformer("DatologyAI/luxical-one", revision="refs/pr/1", trust_remote_code=True)

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]
embeddings = luxical_one.encode(sentences)

embeddings = torch.tensor(embeddings)
print(embeddings.shape)
# torch.Size([3, 192])

similarities = embeddings @ embeddings.T
print(similarities)
'''
tensor([[1.0000, 0.8420, 0.5579],
        [0.8420, 1.0000, 0.5876],
        [0.5579, 0.5876, 1.0000]])
'''

luxical_one.save_pretrained("tmp")
model_fresh = SentenceTransformer("tmp", trust_remote_code=True)
print(model_fresh)

embeddings_fresh = model_fresh.encode(sentences)
embeddings_fresh = torch.tensor(embeddings_fresh)
print(torch.allclose(embeddings, embeddings_fresh))

Which also shows that the model saves and loads correctly again.

This should give you the opportunity to integrate directly with third parties of Sentence Transformers, like MTEB, LangChain, LlamaIndex, Haystack, Txtai, etc., as well as the Sentence Transformers evaluation, specifically the NanoBEIREvaluator should be interesting.

  • Tom Aarsen
tomaarsen changed pull request status to open
DatologyAI org

Thank you so much @tomaarsen !

lukemerrick changed pull request status to merged

Sign up or log in to comment