OpusTranslate Collection Collection of tiny models for the OpusTranslate mobile phone application. • 25 items • Updated Apr 13 • 5
view article Article Train AI models with Unsloth and Hugging Face Jobs for FREE +4 burtenshaw, danielhanchen, shimmyshimmer, mlabonne, davanstrien, evalstate • Feb 20 • 102
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 ggerganov, ngxson, allozaur, lysandre, victor, julien-c • Feb 20 • 505
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR Paper • 2601.14251 • Published Jan 20 • 28
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR lightonai • Oct 23, 2025 • 73
view article Article M2.1: Multilingual and Multi-Task Coding with Strong Generalization MiniMaxAI • Jan 5 • 40
view article Article Efficient MultiModal Data Pipeline +3 ariG23498, lusxvr, andito, sergiopaniego, pcuenq • Jul 8, 2025 • 70
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 lysandre, ArthurZ, cyrilvallez, reach-vb • Dec 1, 2025 • 311
SHAMIYAT: A Collection of Syrian Dialect Datasets & LLMs Collection A collection of datasets and language models focused on the Syrian dialect, supporting NLP research and applications for Syria • 4 items • Updated Nov 28, 2025 • 2
view article Article How to train a new language model from scratch using Transformers and Tokenizers julien-c • Feb 14, 2020 • 61
Yiddish Whisper Training Collection Yiddish based Whisper post-training - Crowd Sourced Open Data • 10 items • Updated Mar 2 • 5
Scaling Low-Res MT via Synthetic Data Generation with LLMs Collection Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025. • 8 items • Updated Sep 16, 2025 • 1
Scaling Low-Resource MT via Synthetic Data Generation with LLMs Paper • 2505.14423 • Published May 20, 2025 • 2
DictaBERT Collection Collection of state-of-the-art language model for Hebrew, finetuned for various tasks, as detailed in the article: https://arxiv.org/abs/2308.16687 • 17 items • Updated Apr 4, 2024 • 7