Submitted by taesiri 212 Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds ByteDance Seed 13
Submitted by taesiri 55 Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Technion Israel institute of technology 340 2
Submitted by taesiri 20 WMPO: World Model-based Policy Optimization for Vision-Language-Action Models ByteDance Seed 175 2
Submitted by zhuiguang-ning 18 LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Shanghai Jiao Tong University 59 2
Submitted by ZhenYang21 14 WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation Tsinghua University 37 2
Submitted by ZhenYang21 13 MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning Tsinghua University 9 3
Submitted by kwanyoung 6 Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance · 1 authors 2
Submitted by s-emanuilov 2 Stemming Hallucination in Language Models Using a Licensing Oracle · 2 authors 2