Instructions to use IDEA-Research/Rex-Omni with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use IDEA-Research/Rex-Omni with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="IDEA-Research/Rex-Omni")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("IDEA-Research/Rex-Omni")
model = AutoModelForImageTextToText.from_pretrained("IDEA-Research/Rex-Omni")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use IDEA-Research/Rex-Omni with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "IDEA-Research/Rex-Omni"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IDEA-Research/Rex-Omni",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/IDEA-Research/Rex-Omni

SGLang

How to use IDEA-Research/Rex-Omni with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "IDEA-Research/Rex-Omni" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IDEA-Research/Rex-Omni",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "IDEA-Research/Rex-Omni" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IDEA-Research/Rex-Omni",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use IDEA-Research/Rex-Omni with Docker Model Runner:
```
docker model run hf.co/IDEA-Research/Rex-Omni
```

This model is Rex-Omni, a 3B-parameter Multimodal Large Language Model (MLLM) presented in the paper "Detect Anything via Next Point Prediction". It is compatible with the Hugging Face transformers library and is licensed under the IDEA License 1.0.

Detect Anything via Next Point Prediction

Rex-Omni is a 3B-parameter Multimodal Large Language Model (MLLM) that redefines object detection and a wide range of other visual perception tasks as a simple next-token prediction problem.

🚀 Quick Start

Installation

conda create -n rexomni -m python=3.10
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
git clone https://github.com/IDEA-Research/Rex-Omni.git
cd Rex-Omni
pip install -v -e .

2. Quick Start: Using Rex-Omni for Detection

from PIL import Image
from rex_omni import RexOmniWrapper, RexOmniVisualize

# Initialize model
model = RexOmniWrapper(
    model_path="IDEA-Research/Rex-Omni",
    backend="transformers"  # or "vllm"
)

# Load image
image = Image.open("your_image.jpg")

# Object Detection
results = model.inference(
    images=image,
    task="detection",
    categories=["person", "car", "dog"]
)

result = results[0]

# 4) Visualize
vis = RexOmniVisualize(
    image=image,
    predictions=result["extracted_predictions"],
    font_size=20,
    draw_width=5,
    show_labels=True,
)
vis.save("visualize.jpg")

3. Tutorials

We provide a series of tutorials to help you get started with Rex-Omni.

📄 License

Rex-Omni is licensed under the IDEA License 1.0, Copyright (c) IDEA. All Rights Reserved. This model is based on Qwen, which is licensed under the Qwen RESEARCH LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.

🔗 Links

🏠 Homepage
🎮 Demo

📧 Contact

For questions and feedback, please contact us at:

Email: jiangqing@idea.edu.cn
GitHub Issues: IDEA-Research/Rex-Omni

7. Citation

Rex-Omni comes from a series of prior works. If you’re interested, you can take a look.

@misc{jiang2025detectpointprediction,
      title={Detect Anything via Next Point Prediction}, 
      author={Qing Jiang and Junan Huo and Xingyu Chen and Yuda Xiong and Zhaoyang Zeng and Yihao Chen and Tianhe Ren and Junzhi Yu and Lei Zhang},
      year={2025},
      eprint={2510.12798},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.12798}, 
}

Downloads last month: 40,904

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for IDEA-Research/Rex-Omni

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

(754)

this model

Space using IDEA-Research/Rex-Omni 1

Papers for IDEA-Research/Rex-Omni

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding

Paper • 2411.14347 • Published Nov 21, 2024 • 16