Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.04615

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Runtime error

Featured

454

OmniParser V2

🏢

454

OmniParser, turn your LLM into GUI agent
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
microsoft/Magma-8B

Robotics • Updated Dec 10, 2025 • 991 • 415
mlfoundations/Gelato-30B-A3B

Image-Text-to-Text • Updated Nov 15, 2025 • 130 • 31

Llammy3.2-3B-GUFF

prithivMLmods/Llama-Sentient-3.2-3B-Instruct

Text Generation • Updated Dec 10, 2024 • 1 • 9
bartendr604/Llama.Diffusion.Flix

Updated Apr 12, 2025 • 1
Running

1.42k

FLUX Unlimited

🔥

1.42k

Use the FLUX model as much as you want.
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23, 2025 • 58.8k • 97

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25, 2024 • 55
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 101

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Papers related to web agents, screen agents, autonomous agents that can mimic human users in computer and/or web use

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
WebArena: A Realistic Web Environment for Building Autonomous Agents

Paper • 2307.13854 • Published Jul 25, 2023 • 27
Mind2Web: Towards a Generalist Agent for the Web

Paper • 2306.06070 • Published Jun 9, 2023 • 21
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Paper • 2410.13232 • Published Oct 17, 2024 • 44

Research on GUI Models

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 215
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 90
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 79

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Runtime error

Featured

454

OmniParser V2

🏢

454

OmniParser, turn your LLM into GUI agent
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
microsoft/Magma-8B

Robotics • Updated Dec 10, 2025 • 991 • 415
mlfoundations/Gelato-30B-A3B

Image-Text-to-Text • Updated Nov 15, 2025 • 130 • 31

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Llammy3.2-3B-GUFF

prithivMLmods/Llama-Sentient-3.2-3B-Instruct

Text Generation • Updated Dec 10, 2024 • 1 • 9
bartendr604/Llama.Diffusion.Flix

Updated Apr 12, 2025 • 1
Running

1.42k

FLUX Unlimited

🔥

1.42k

Use the FLUX model as much as you want.
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23, 2025 • 58.8k • 97

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25, 2024 • 55
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 101

Papers related to web agents, screen agents, autonomous agents that can mimic human users in computer and/or web use

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
WebArena: A Realistic Web Environment for Building Autonomous Agents

Paper • 2307.13854 • Published Jul 25, 2023 • 27
Mind2Web: Towards a Generalist Agent for the Web

Paper • 2306.06070 • Published Jun 9, 2023 • 21
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Paper • 2410.13232 • Published Oct 17, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Research on GUI Models

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 215
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 90
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 79

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs