Bohan Zhai PRO

Borise

·

AI & ML interests

LLM, Audio, NLP, 3D vision, vision language

Organizations

upvoted a paper 4 months ago

VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

Paper • 2603.23481 • Published Mar 24 • 7

upvoted an article 8 months ago

Article

📌 Rethinking Multimodality from an Industry Perspective: Captioning Is Far More Important Than You Think

Borise

•

Nov 29, 2025

• 3

upvoted a paper 8 months ago

CaptionQA: Is Your Caption as Useful as the Image Itself?

Paper • 2511.21025 • Published Nov 26, 2025 • 29

upvoted an article over 1 year ago

Article

SigLIP 2: A better multilingual vision language encoder

+1

ariG23498, merve, qubvel-hf

•

Feb 21, 2025

• 222

upvoted a paper almost 2 years ago

Agile Continuous Jumping in Discontinuous Terrains

Paper • 2409.10923 • Published Sep 17, 2024 • 12

upvoted an article almost 2 years ago

Article

Key Insights into the Law of Vision Representations in MLLMs

Borise

•

Sep 2, 2024

• 20

upvoted a paper almost 2 years ago

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29, 2024 • 95

upvoted a paper over 2 years ago

HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption

Paper • 2310.01779 • Published Oct 3, 2023 • 4