-
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
Paper • 2604.06870 • Published • 41 -
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision
Paper • 2604.04934 • Published • 45 -
VOID: Video Object and Interaction Deletion
Paper • 2604.02296 • Published • 53 -
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
Paper • 2604.08526 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2604.02296
-
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
Paper • 2601.01425 • Published • 53 -
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Paper • 2601.20218 • Published • 16 -
FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space
Paper • 2602.02092 • Published • 18 -
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
Paper • 2602.03796 • Published • 64
-
The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation
Paper • 2601.17737 • Published • 56 -
Advancing Open-source World Models
Paper • 2601.20540 • Published • 135 -
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 21 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 50
-
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
Paper • 2604.06870 • Published • 41 -
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision
Paper • 2604.04934 • Published • 45 -
VOID: Video Object and Interaction Deletion
Paper • 2604.02296 • Published • 53 -
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
Paper • 2604.08526 • Published • 20
-
The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation
Paper • 2601.17737 • Published • 56 -
Advancing Open-source World Models
Paper • 2601.20540 • Published • 135 -
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 21 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 50
-
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
Paper • 2601.01425 • Published • 53 -
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Paper • 2601.20218 • Published • 16 -
FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space
Paper • 2602.02092 • Published • 18 -
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
Paper • 2602.03796 • Published • 64