My AI - a malkesh2911 Collection

malkesh2911 's Collections

My AI

updated Jan 30

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 117
Kwaipilot/KAT-Dev-72B-Exp

Text Generation • 73B • Updated Oct 13, 2025 • 16 • 158
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16, 2025 • 106
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published Nov 17, 2025 • 19
microsoft/bitnet-b1.58-2B-4T

Text Generation • Updated Dec 17, 2025 • 11.3k • 1.37k