Instructions to use jeffwan/mmarco-mMiniLMv2-L12-H384-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jeffwan/mmarco-mMiniLMv2-L12-H384-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="jeffwan/mmarco-mMiniLMv2-L12-H384-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("jeffwan/mmarco-mMiniLMv2-L12-H384-v1") model = AutoModelForSequenceClassification.from_pretrained("jeffwan/mmarco-mMiniLMv2-L12-H384-v1") - Notebooks
- Google Colab
- Kaggle
Cross-Encoder for multilingual MS Marco
This model was trained on the MMARCO dataset. It is a machine translated version of MS MARCO using Google Translate. It was translated to 14 languages. In our experiments, we observed that it performs also well for other languages.
As a base model, we used the multilingual MiniLMv2 model.
The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco
Usage with SentenceTransformers
The usage becomes easy when you have SentenceTransformers installed. Then, you can use the pre-trained models like this:
from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name')
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
Usage with Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained('model_name')
tokenizer = AutoTokenizer.from_pretrained('model_name')
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
scores = model(**features).logits
print(scores)
- Downloads last month
- 441