Instructions to use numind/NuExtract-2.0-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use numind/NuExtract-2.0-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="numind/NuExtract-2.0-4B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("numind/NuExtract-2.0-4B") model = AutoModelForImageTextToText.from_pretrained("numind/NuExtract-2.0-4B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use numind/NuExtract-2.0-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "numind/NuExtract-2.0-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "numind/NuExtract-2.0-4B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/numind/NuExtract-2.0-4B
- SGLang
How to use numind/NuExtract-2.0-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "numind/NuExtract-2.0-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "numind/NuExtract-2.0-4B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "numind/NuExtract-2.0-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "numind/NuExtract-2.0-4B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use numind/NuExtract-2.0-4B with Docker Model Runner:
docker model run hf.co/numind/NuExtract-2.0-4B
Why is NuExtract-2.0-8B is inferior than 4B?
I used this model recently and noticed that 4B model performs way better than 8B while underlying VL models are of same family Qwen2.5 with varying capacity like 3B and 7B model respectively.
BTW. I love the way this model work. For now, I am passing field description in place of data type in template and it is working for my use case. However, it would be great if there's a way to provide description of the field which we want to extract.
Thanks for trying out the models -- it's good to hear you are liking them. :)
I'm not entirely sure why the 4B is out-performing the 8B for you. The 8B is reliably a bit stronger than the 4B across our benchmarks, so it might be something specific to your domain. Can you describe what kind of problem/data you are working on?
Btw, yes, having an official way to provide field descriptions is something we have been asked for a lot and so we are working to implement this feature asap.
Appreciate the feedback!
@liamcripwell thanks for the prompt response. I am currently using this model to extract data from insurance document like insured name, start date, end date, policy number, premium amount, etc.
Also, may be not the right thread, but flash-attn 2.8.1 package was recently released on 10th July which is causing model to fail. I had to use old version i.e., 2.7.3 to make it work.
@liamcripwell thanks for the prompt response. I am currently using this model to extract data from insurance document like insured name, start date, end date, policy number, premium amount, etc.
Also, may be not the right thread, but flash-attn 2.8.1 package was recently released on 10th July which is causing model to fail. I had to use old version i.e., 2.7.3 to make it work.
same same , when i use the flash-attn ,it doesn’t work, just only in the pure transformer could work, and you say “it would be great if there's a way to provide description of the field which we want to extract.” i m trying with use Genmini or Claude to help me figure out the rule of prompt ,it work ,hop NuMind would be open a proprietary rule understanding model corresponding to this model