Instructions to use numind/NuExtract-2.0-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use numind/NuExtract-2.0-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="numind/NuExtract-2.0-4B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("numind/NuExtract-2.0-4B")
model = AutoModelForImageTextToText.from_pretrained("numind/NuExtract-2.0-4B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use numind/NuExtract-2.0-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "numind/NuExtract-2.0-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "numind/NuExtract-2.0-4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/numind/NuExtract-2.0-4B

SGLang

How to use numind/NuExtract-2.0-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "numind/NuExtract-2.0-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "numind/NuExtract-2.0-4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "numind/NuExtract-2.0-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "numind/NuExtract-2.0-4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use numind/NuExtract-2.0-4B with Docker Model Runner:
```
docker model run hf.co/numind/NuExtract-2.0-4B
```

Why is NuExtract-2.0-8B is inferior than 4B?

by ikiransuryavanshi - opened Jul 21, 2025

Discussion

ikiransuryavanshi

Jul 21, 2025

•

edited Jul 21, 2025

I used this model recently and noticed that 4B model performs way better than 8B while underlying VL models are of same family Qwen2.5 with varying capacity like 3B and 7B model respectively.

BTW. I love the way this model work. For now, I am passing field description in place of data type in template and it is working for my use case. However, it would be great if there's a way to provide description of the field which we want to extract.

liamcripwell

Jul 21, 2025

Thanks for trying out the models -- it's good to hear you are liking them. :)

I'm not entirely sure why the 4B is out-performing the 8B for you. The 8B is reliably a bit stronger than the 4B across our benchmarks, so it might be something specific to your domain. Can you describe what kind of problem/data you are working on?

Btw, yes, having an official way to provide field descriptions is something we have been asked for a lot and so we are working to implement this feature asap.

Appreciate the feedback!

ikiransuryavanshi

Jul 21, 2025

@liamcripwell thanks for the prompt response. I am currently using this model to extract data from insurance document like insured name, start date, end date, policy number, premium amount, etc.

Also, may be not the right thread, but flash-attn 2.8.1 package was recently released on 10th July which is causing model to fail. I had to use old version i.e., 2.7.3 to make it work.

baiall

Aug 10, 2025

@liamcripwell thanks for the prompt response. I am currently using this model to extract data from insurance document like insured name, start date, end date, policy number, premium amount, etc.

Also, may be not the right thread, but flash-attn 2.8.1 package was recently released on 10th July which is causing model to fail. I had to use old version i.e., 2.7.3 to make it work.

same same , when i use the flash-attn ,it doesn’t work, just only in the pure transformer could work, and you say “it would be great if there's a way to provide description of the field which we want to extract.” i m trying with use Genmini or Claude to help me figure out the rule of prompt ,it work ,hop NuMind would be open a proprietary rule understanding model corresponding to this model

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment