ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents Paper • 2604.23781 • Published 8 days ago • 32
SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published 11 days ago • 32
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 7 days ago • 66
Why Fine-Tuning Encourages Hallucinations and How to Fix It Paper • 2604.15574 • Published 18 days ago • 21
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis Paper • 2604.24198 • Published 7 days ago • 20
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models Paper • 2604.17565 • Published 15 days ago • 10
For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs Paper • 2508.10180 • Published 9 days ago • 17
Efficient Agent Evaluation via Diversity-Guided User Simulation Paper • 2604.21480 • Published 11 days ago • 14
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment Paper • 2604.19548 • Published 13 days ago • 15
Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data Paper • 2604.24479 • Published 7 days ago • 7
PageGuide: Browser extension to assist users in navigating a webpage and locating information Paper • 2604.23772 • Published 8 days ago • 6
Learning to Identify Out-of-Distribution Objects for 3D LiDAR Anomaly Segmentation Paper • 2604.23604 • Published 8 days ago • 5
ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers Paper • 2604.22841 • Published 13 days ago • 4
RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing Paper • 2604.23644 • Published 8 days ago • 4
Discovering Agentic Safety Specifications from 1-Bit Danger Signals Paper • 2604.23210 • Published 9 days ago • 3
EX-FIQA: Leveraging Intermediate Early eXit Representations from Vision Transformers for Face Image Quality Assessment Paper • 2604.22842 • Published 13 days ago • 3
Towards Understanding the Robustness of Sparse Autoencoders Paper • 2604.18756 • Published 14 days ago • 9
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate Paper • 2604.25203 • Published 6 days ago • 7
MAIC-UI: Making Interactive Courseware with Generative UI Paper • 2604.25806 • Published 6 days ago • 7