Image-Text-to-Text
Transformers
Safetensors
English
Chinese
qwen2_5_vl
image-to-text
Document
VLM
OCR
VL
Camel
Openpdf
text-generation-inference
Extraction
Linking
Markdown
Document Digitization
Intelligent Document Processing (IDP)
Intelligent Word Recognition (IWR)
Optical Mark Recognition (OMR)
conversational
| {% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system | |
| You are a helpful assistant.<|im_end|> | |
| {% endif %}<|im_start|>{{ message['role'] }} | |
| {% if message['content'] is string %}{{ message['content'] }}<|im_end|> | |
| {% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|> | |
| {% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant | |
| {% endif %} |