-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2504.17192
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 16 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158
-
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Paper • 2504.16064 • Published • 14 -
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Paper • 2504.14032 • Published • 7 -
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 155 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120
-
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Paper • 2504.08791 • Published • 137 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 130
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Paper • 2504.16064 • Published • 14 -
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Paper • 2504.14032 • Published • 7 -
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 155 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 16 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158
-
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Paper • 2504.08791 • Published • 137 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 130