Qwen2.5-3B-Instruct-hsb-dsb

This model is the TartuNLP submission to the WMT25 Shared Task on Limited Resource Slavic Languages, covering Upper Sorbian (hsb) and Lower Sorbian (dsb).
It is based on Qwen2.5-3B-Instruct and adapted through continued pretraining on Sorbian monolingual and parallel data, plus general instruction-tuning datasets.

The model jointly supports machine translation (MT) and question answering (QA) for both Sorbian languages, achieving the top rank in the shared task.

📄More details in the shared task paper.

⚠️ Note: This model is research-focused and has not been tested for general usage. Use at your own risk.

Example usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "Translate the following text from German to Upper Sorbian."},
    {"role": "user", "content": "Wie lange willst du noch bleiben?"}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Shared task results

Results shared by the organizers (source).

Upper Sorbian:

	DE-HSB	points	HSB-QA	points	final points
TartuNLP	86.33	4	58.10	4	8
NRC	87.20	4	29.05	1	5
SDKM	75.73	2	55.24	3	5
baseline	13.88	1	42.86	2	3

Lower Sorbian:

	DE-DSB	points	DSB-QA	points	final points
TartuNLP	78.20	4	57.56	4	8
NRC	78.24	4	32.20	1	5
SDKM	64.34	2	51.71	3	5
baseline	12.21	1	45.85	2	3

Training details

Total training tokens: ~1.2B
Sequence length: 4096
Training hardware: LUMI supercomputer (AMD MI250x GPUs)
Training time: ~139 GPU-hours

Citation info

@inproceedings{purason-fishel-2025-tartunlp,
    title = "{T}artu{NLP} at {WMT}25 {LLM}s with Limited Resources for {S}lavic Languages Shared Task",
    author = "Purason, Taido  and
      Fishel, Mark",
    editor = "Haddow, Barry  and
      Kocmi, Tom  and
      Koehn, Philipp  and
      Monz, Christof",
    booktitle = "Proceedings of the Tenth Conference on Machine Translation",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.wmt-1.88/",
    doi = "10.18653/v1/2025.wmt-1.88",
    pages = "1143--1150",
    ISBN = "979-8-89176-341-8",
    abstract = "This paper describes the TartuNLP submission to the Upper Sorbian (hsb) and Lower Sorbian (dsb) tracks of the WMT25 LLMs with Limited Resources for Slavic Languages shared task, which jointly targets machine translation (MT) and question answering (QA). We develop a single multilingual model based on Qwen2.5-3B-Instruct by continuing pretraining on Sorbian monolingual and parallel data together with general instruction datasets, combining language acquisition and instruction-following in a single step. The resulting model delivers substantial improvements over the baseline Qwen2.5-3B-Instruct model and also achieves the highest ranking for both tasks in the hsb and dsb shared task tracks."
}

Downloads last month: 14

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

(1098)

this model

tartuNLP
/

Qwen2.5-3B-Instruct-hsb-dsb