HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming Paper • 2512.21338 • Published 10 days ago • 20
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition Paper • 2205.13535 • Published May 26, 2022
WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception Paper • 2508.15720 • Published Aug 21, 2025
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis Paper • 2402.16117 • Published Feb 25, 2024
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published Dec 1, 2025 • 70
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Paper • 2502.05179 • Published Feb 7, 2025 • 24
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM Paper • 2412.15156 • Published Dec 19, 2024
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Paper • 2502.05179 • Published Feb 7, 2025 • 24
ControlAR: Controllable Image Generation with Autoregressive Models Paper • 2410.02705 • Published Oct 3, 2024 • 11
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10, 2024 • 71
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation Paper • 2304.09801 • Published Apr 19, 2023
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations Paper • 2202.07800 • Published Feb 16, 2022
Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training Paper • 2309.13942 • Published Sep 25, 2023 • 1