Instella: Fully Open Language Models with Stellar Performance Paper • 2511.10628 • Published Nov 13 • 4 • 2
SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers Paper • 2507.20527 • Published Jul 28 • 6 • 2
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games Paper • 2506.10209 • Published Jun 11 • 2