Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Translation Paper • 2511.17290 • Published Nov 21, 2025 • 1
🇪🇪 Estonian LLM Evaluation Collection A collection of resources for evaluation of LLM capabilities in the Estonian language. • 33 items • Updated Dec 13, 2025 • 5
Multilingual Benchmarks Collection Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets (EACL MME 2026) • 29 items • Updated 11 days ago • 2
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets Paper • 2602.22207 • Published 12 days ago • 39
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 Feb 4 • 88
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 41
Jupyter Agent Collection Blog: https://huggingface.co/blog/jupyter-agent-2 • 4 items • Updated Sep 12, 2025 • 3
MamayLM-v1.0-Gemma-3 Collection First Open and Multimodal Ukrainian-focused LLM • 5 items • Updated Oct 8, 2025 • 18
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 Sep 11, 2025 • 184
Apertus LLM Collection Democratizing Open and Compliant LLMs for Global Language Environments: 8B and 70B open-data open-weights models, multilingual in >1000 languages • 4 items • Updated Oct 1, 2025 • 335
view article Article Announcing UA-Code-Bench: a New Benchmark for Evaluating LLMs on Competitive Programming Tasks in Ukrainian Jul 12, 2025 • 2
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 Aug 8, 2025 • 93
view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages Jul 8, 2025 • 35
The Jailbreak Tax (Jailbreak Utility) Collection Models and dataset used in paper "The Jailbreak Tax: How Useful Are Your Jailbreak Outputs" • 13 items • Updated Apr 5, 2025 • 2