Qwen2.5-3B-Instruct-hsb-dsb
This model is the TartuNLP submission to the WMT25 Shared Task on Limited Resource Slavic Languages, covering Upper Sorbian (hsb) and Lower Sorbian (dsb).
It is based on Qwen2.5-3B-Instruct and adapted through continued pretraining on Sorbian monolingual and parallel data, plus general instruction-tuning datasets.
The model jointly supports machine translation (MT) and question answering (QA) for both Sorbian languages, achieving the top rank in the shared task.
📄More details in the shared task paper.
⚠️ Note: This model is research-focused and has not been tested for general usage. Use at your own risk.
Example usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "Translate the following text from German to Upper Sorbian."},
{"role": "user", "content": "Wie lange willst du noch bleiben?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Shared task results
Results shared by the organizers (source).
Upper Sorbian:
| DE-HSB | points | HSB-QA | points | final points | |
|---|---|---|---|---|---|
| TartuNLP | 86.33 | 4 | 58.10 | 4 | 8 |
| NRC | 87.20 | 4 | 29.05 | 1 | 5 |
| SDKM | 75.73 | 2 | 55.24 | 3 | 5 |
| baseline | 13.88 | 1 | 42.86 | 2 | 3 |
Lower Sorbian:
| DE-DSB | points | DSB-QA | points | final points | |
|---|---|---|---|---|---|
| TartuNLP | 78.20 | 4 | 57.56 | 4 | 8 |
| NRC | 78.24 | 4 | 32.20 | 1 | 5 |
| SDKM | 64.34 | 2 | 51.71 | 3 | 5 |
| baseline | 12.21 | 1 | 45.85 | 2 | 3 |
Training details
- Total training tokens: ~1.2B
- Sequence length: 4096
- Training hardware: LUMI supercomputer (AMD MI250x GPUs)
- Training time: ~139 GPU-hours
Citation info
@inproceedings{purason-fishel-2025-tartunlp,
title = "{T}artu{NLP} at {WMT}25 {LLM}s with Limited Resources for {S}lavic Languages Shared Task",
author = "Purason, Taido and
Fishel, Mark",
editor = "Haddow, Barry and
Kocmi, Tom and
Koehn, Philipp and
Monz, Christof",
booktitle = "Proceedings of the Tenth Conference on Machine Translation",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.wmt-1.88/",
doi = "10.18653/v1/2025.wmt-1.88",
pages = "1143--1150",
ISBN = "979-8-89176-341-8",
abstract = "This paper describes the TartuNLP submission to the Upper Sorbian (hsb) and Lower Sorbian (dsb) tracks of the WMT25 LLMs with Limited Resources for Slavic Languages shared task, which jointly targets machine translation (MT) and question answering (QA). We develop a single multilingual model based on Qwen2.5-3B-Instruct by continuing pretraining on Sorbian monolingual and parallel data together with general instruction datasets, combining language acquisition and instruction-following in a single step. The resulting model delivers substantial improvements over the baseline Qwen2.5-3B-Instruct model and also achieves the highest ranking for both tasks in the hsb and dsb shared task tracks."
}
- Downloads last month
- 14