Instructions to use IDEA-Research/Rex-Omni with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use IDEA-Research/Rex-Omni with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="IDEA-Research/Rex-Omni") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("IDEA-Research/Rex-Omni") model = AutoModelForImageTextToText.from_pretrained("IDEA-Research/Rex-Omni") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use IDEA-Research/Rex-Omni with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "IDEA-Research/Rex-Omni" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IDEA-Research/Rex-Omni", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/IDEA-Research/Rex-Omni
- SGLang
How to use IDEA-Research/Rex-Omni with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "IDEA-Research/Rex-Omni" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IDEA-Research/Rex-Omni", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "IDEA-Research/Rex-Omni" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IDEA-Research/Rex-Omni", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use IDEA-Research/Rex-Omni with Docker Model Runner:
docker model run hf.co/IDEA-Research/Rex-Omni
This model is Rex-Omni, a 3B-parameter Multimodal Large Language Model (MLLM) presented in the paper "Detect Anything via Next Point Prediction". It is compatible with the Hugging Face transformers library and is licensed under the IDEA License 1.0.
Detect Anything via Next Point Prediction
Rex-Omni is a 3B-parameter Multimodal Large Language Model (MLLM) that redefines object detection and a wide range of other visual perception tasks as a simple next-token prediction problem.

🚀 Quick Start
Installation
conda create -n rexomni -m python=3.10
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
git clone https://github.com/IDEA-Research/Rex-Omni.git
cd Rex-Omni
pip install -v -e .
2. Quick Start: Using Rex-Omni for Detection
from PIL import Image
from rex_omni import RexOmniWrapper, RexOmniVisualize
# Initialize model
model = RexOmniWrapper(
model_path="IDEA-Research/Rex-Omni",
backend="transformers" # or "vllm"
)
# Load image
image = Image.open("your_image.jpg")
# Object Detection
results = model.inference(
images=image,
task="detection",
categories=["person", "car", "dog"]
)
result = results[0]
# 4) Visualize
vis = RexOmniVisualize(
image=image,
predictions=result["extracted_predictions"],
font_size=20,
draw_width=5,
show_labels=True,
)
vis.save("visualize.jpg")
3. Tutorials
We provide a series of tutorials to help you get started with Rex-Omni.
- Detection Example
- Pointing Example
- OCR Example
- Keypointing Example
- Visual Prompting Example
- Batch Inference Example
📄 License
Rex-Omni is licensed under the IDEA License 1.0, Copyright (c) IDEA. All Rights Reserved. This model is based on Qwen, which is licensed under the Qwen RESEARCH LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
🔗 Links
📧 Contact
For questions and feedback, please contact us at:
- Email: jiangqing@idea.edu.cn
- GitHub Issues: IDEA-Research/Rex-Omni
7. Citation
Rex-Omni comes from a series of prior works. If you’re interested, you can take a look.
@misc{jiang2025detectpointprediction,
title={Detect Anything via Next Point Prediction},
author={Qing Jiang and Junan Huo and Xingyu Chen and Yuda Xiong and Zhaoyang Zeng and Yihao Chen and Tianhe Ren and Junzhi Yu and Lei Zhang},
year={2025},
eprint={2510.12798},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.12798},
}
- Downloads last month
- 40,904
Model tree for IDEA-Research/Rex-Omni
Base model
Qwen/Qwen2.5-VL-3B-Instruct