sentence-transformers
Safetensors
English
qwen2
text-embeddings
telecom
domain-adaptation
triplet-loss
transformer
semantic-search
domain-specific
contrastive-learning
simcse
bio-bert
don’t-stop-pretraining
custom_code
Eval Results (legacy)
Instructions to use NetoAISolutions/T-VEC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use NetoAISolutions/T-VEC with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("NetoAISolutions/T-VEC", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| language: | |
| - en | |
| tags: | |
| - text-embeddings | |
| - telecom | |
| - domain-adaptation | |
| - triplet-loss | |
| - transformer | |
| - semantic-search | |
| - sentence-transformers | |
| - domain-specific | |
| - contrastive-learning | |
| - simcse | |
| - bio-bert | |
| - don’t-stop-pretraining | |
| metrics: | |
| - name: Telecom Triplet Score | |
| type: accuracy | |
| value: 0.9380 | |
| verified: false | |
| - name: Average MTEB Score | |
| type: accuracy | |
| value: 0.825 | |
| verified: false | |
| - name: Average STS Score | |
| type: spearman | |
| value: 82.19 | |
| verified: false | |
| - name: AllNLI Triplet Score | |
| type: accuracy | |
| value: 0.6150 | |
| verified: false | |
| base_model: | |
| - Alibaba-NLP/gte-Qwen2-1.5B-instruct | |
| model-index: | |
| - name: T-VEC | |
| results: | |
| - task: | |
| type: text-embedding | |
| name: Telecom Triplet Benchmark | |
| dataset: | |
| type: custom | |
| name: Telecom Triplet Benchmark | |
| metrics: | |
| - name: Telecom Triplet Score | |
| type: accuracy | |
| value: 0.9380 | |
| verified: false | |
| - task: | |
| type: text-embedding | |
| name: MTEB Benchmark | |
| dataset: | |
| type: openai_humaneval | |
| name: MTEB Benchmark | |
| metrics: | |
| - name: Average MTEB Score | |
| type: accuracy | |
| value: 0.825 | |
| verified: false | |
| - task: | |
| type: text-embedding | |
| name: STS Benchmark | |
| dataset: | |
| type: openai_humaneval | |
| name: STS Benchmark | |
| metrics: | |
| - name: Average STS Score | |
| type: spearman | |
| value: 82.19 | |
| verified: false | |
| - task: | |
| type: text-embedding | |
| name: AllNLI Triplet | |
| dataset: | |
| type: openai_humaneval | |
| name: AllNLI Triplet | |
| metrics: | |
| - name: Triplet Score | |
| type: accuracy | |
| value: 0.6150 | |
| verified: false | |
| extra_gated_prompt: "Please provide answers to the below questions to gain access to the model" | |
| extra_gated_fields: | |
| Company: text | |
| Full Name: text | |
| Email: text | |
| I want to use this model for: | |
| type: select | |
| options: | |
| - Research | |
| - Education | |
| - Commercial | |
| - label: Other | |
| value: other | |
| # T-VEC: A Telecom-Specific Text Embedding Model | |
| ## Overview | |
| **T-VEC (Telecom Vectorization Model)** is a domain-adapted text embedding model developed by NetoAI and fine-tuned from [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct). Using a deeply supervised triplet-loss approach, T-VEC learns rich semantic representations tailored to telecom use cases, achieving state-of-the-art results on custom and standard benchmarks. | |
| ## Model Details | |
| - **Model Name**: T-VEC | |
| - **Developer**: [NetoAI](https://www.netoai.ai) | |
| - **Base Model**: Alibaba-NLP/gte-Qwen2-1.5B-instruct | |
| - **Parameters**: 1.5 Billion | |
| - **Embedding Dimension**: 1536 | |
| - **Max Input Tokens**: 32,000 | |
| - **Languages**: Multilingual (optimized for English) | |
| - **License**: MIT | |
| - **Tokenizer**: Custom telecom-specific tokenizer (open-source) | |
| ## Intended Uses | |
| - Semantic search over telecom documents (3GPP standards, vendor manuals) | |
| - Fault log analysis for root-cause detection | |
| - Telecom-specific chatbots and Q&A systems | |
| - Regulatory compliance analysis and semantic auditing | |
| ## Training Details | |
| - **Objective**: Triplet loss using cosine similarity | |
| - **Dataset**: 100k+ telecom triplets curated by domain experts over 3 months | |
| - **Layer Modification**: 338 transformer layers fine-tuned | |
| - **Avg. L2 Norm Weight Change**: 0.7735 | |
| - **Enhancements**: Telecom-specific tokenizer and query-aware anchor strategies | |
| ## Evaluation Results | |
| | Benchmark | Metric | Score | | |
| |-----------------------------|----------------------|--------| | |
| | Telecom Triplet Benchmark | Accuracy | 0.9380 | | |
| | MTEB Benchmark | Accuracy | 0.825 | | |
| | STS Benchmark | Spearman Correlation | 82.19 | | |
| | AllNLI Triplet | Accuracy | 0.6150 | | |
| T-VEC significantly outperforms both its base model and other strong general-purpose models on telecom-specific benchmarks, while still retaining competitive general performance. | |
| | Model | ArguAna | SciDocsRR | STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | | |
| |--------------------------------|---------|--------------|-------------|------------|------------|------------|------------|--------------| | |
| | gte‑Qwen2‑1.5B‑instruct | 0.62335 | 0.81558 | 0.72805 | 0.84699 | 0.78803 | 0.87450 | 0.84938 | 0.85379 | | |
| | T‑VEC | 0.61150 | 0.83970 | 0.80320 | 0.88220 | 0.82750 | 0.88260 | 0.84780 | 0.88050 | | |
| | all‑MiniLM‑L6‑v2 | 0.50167 | 0.87119 | 0.72369 | 0.80603 | 0.75589 | 0.85390 | 0.78989 | 0.82032 | | |
| | all‑mpnet‑base‑v2 | 0.46521 | 0.88654 | 0.72634 | 0.83485 | 0.78000 | 0.85663 | 0.80030 | 0.83422 | | |
| | bge‑base‑en‑v1.5 | 0.63616 | 0.87494 | 0.78028 | 0.84184 | 0.82273 | 0.87957 | 0.85474 | 0.86418 | | |
| | e5‑base‑v2 | 0.51604 | 0.82834 | 0.73489 | 0.82997 | 0.80446 | 0.88181 | 0.83659 | 0.85480 | | |
| | jina‑embeddings‑v2‑base‑en | 0.44152 | 0.83106 | 0.74278 | 0.84177 | 0.78808 | 0.87553 | 0.85347 | 0.84842 | | |
| | instructor‑xl | 0.54884 | 0.79538 | 0.74085 | 0.85046 | 0.80318 | 0.88359 | 0.83784 | 0.83048 | | |
| | gte‑base | 0.57151 | 0.87083 | 0.75707 | 0.85729 | 0.81510 | 0.88810 | 0.83824 | 0.85738 | | |
| | multilingual‑e5‑base | 0.47829 | 0.80392 | 0.77933 | 0.76890 | 0.77535 | 0.88373 | 0.82699 | 0.84201 | | |
|  | |
| ## Limitations | |
| - Reduced performance on non-domain tasks (e.g., AllNLI) due to specialization | |
| - Large size may impact deployment on edge devices | |
| - May miss recent telecom developments outside the training set | |
| ## Ethical Considerations | |
| - Use in critical telecom systems should be validated by domain experts | |
| - May reflect terminology biases from dominant vendors in the dataset | |
| - Open licensing (MIT) supports transparency and community contributions | |
| ## Usage | |
| ### Installation | |
| ```bash | |
| pip install transformers | |
| ``` | |
| ### Load and Run | |
| ```python | |
| from transformers import AutoModel, AutoTokenizer | |
| import torch | |
| model = AutoModel.from_pretrained("netoai/t-vec") | |
| tokenizer = AutoTokenizer.from_pretrained("netoai/t-vec") | |
| texts = ["5G NR architecture", "LTE handover", "Core network functions"] | |
| inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=32000) | |
| emb = model(**inputs).last_hidden_state.mean(dim=1) | |
| cos_sim = torch.nn.functional.cosine_similarity(emb[0:1], emb[1:], dim=1) | |
| print(cos_sim) | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @article{ethiraj2025tvec, | |
| title={T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning}, | |
| author={Ethiraj, Vignesh and Menon, Sidhanth and Vijay, Divya}, | |
| journal={arXiv preprint}, | |
| year={2025}, | |
| url={https://arxiv.org/abs/2504.16460} | |
| } | |
| ``` | |
| ## References | |
| - Ethiraj, V., Menon, S., & Vijay, D. (2025). T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning. arXiv:2504.16460. | |
| - Schroff, F., Kalenichenko, D., Philbin, J. “FaceNet: A Unified Embedding for Face Recognition and Clustering.” CVPR, 2015. | |
| - Hermans, A., Beyer, L., Leibe, B. “In Defense of the Triplet Loss for Person Re-Identification.” arXiv:1703.07737, 2017. | |
| - Reimers, N., Gurevych, I. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” EMNLP, 2019. | |
| - Gao, T., Yao, X., Chen, D. “SimCSE: Simple Contrastive Learning of Sentence Embeddings.” arXiv:2104.08821, 2021. | |
| - Gururangan, S., et al. “Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks.” ACL, 2020. | |
| - Lee, J., Yoon, W., et al. “BioBERT: a pre-trained biomedical language representation model for biomedical text mining.” Bioinformatics, 2020. | |
| - Sahu, S. K., Maheshwari, A. “Automatic extraction of telecom network events from log messages.” IEEE ICC, 2018. | |
| - Wang, X., Li, Y., Han, J. “Log2Vec: A Deep Embedding Model for Network Log Analysis.” IEEE/IFIP DSN, 2021. | |
| ## Contact | |
| - For questions or contributions, visit https://www.netoai.ai. | |
| --- | |