jackw
hg2wzh
AI & ML interests
None yet
Recent Activity
upvoted
a
collection
11 days ago
GME Models
upvoted
a
collection
17 days ago
Searching for Better ViT Baselines
upvoted
a
collection
18 days ago
MM Grounding DINO
Organizations
None yet
Datasets
Embed
VLMs
-
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 78 -
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper • 2412.08635 • Published • 48 -
AIDC-AI/Ovis2-2B
Image-Text-to-Text • 2B • Updated • 1.44k • 59 -
DAMO-NLP-SG/VideoLLaMA3-2B
Video-Text-to-Text • 2B • Updated • 2.76k • 16
Text-to-Image
Datasets
Reasoning
Embed
CLIP series
VLMs
-
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 78 -
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper • 2412.08635 • Published • 48 -
AIDC-AI/Ovis2-2B
Image-Text-to-Text • 2B • Updated • 1.44k • 59 -
DAMO-NLP-SG/VideoLLaMA3-2B
Video-Text-to-Text • 2B • Updated • 2.76k • 16
LLMs