Collections
Discover the best community collections!
Collections including paper arxiv:2508.18733
-
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
Paper • 2412.09593 • Published • 18 -
wushuang98/Direct3D-S2
Image-to-3D • Updated • 35 • 79 -
yejunliang23/ShapeLLM-7B-omni
Image-to-3D • 8B • Updated • 683 • 14 -
tencent/Hunyuan3D-2.1
Image-to-3D • Updated • 19.4k • 796
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 19 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 63 -
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Paper • 2410.09604 • Published -
Geospatial Mechanistic Interpretability of Large Language Models
Paper • 2505.03368 • Published • 11 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 8
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 19 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
Paper • 2412.09593 • Published • 18 -
wushuang98/Direct3D-S2
Image-to-3D • Updated • 35 • 79 -
yejunliang23/ShapeLLM-7B-omni
Image-to-3D • 8B • Updated • 683 • 14 -
tencent/Hunyuan3D-2.1
Image-to-3D • Updated • 19.4k • 796
-
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 63 -
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Paper • 2410.09604 • Published -
Geospatial Mechanistic Interpretability of Large Language Models
Paper • 2505.03368 • Published • 11 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 8
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22