Qwen2.5-3B-Instruct-hsb-dsb

This model is the TartuNLP submission to the WMT25 Shared Task on Limited Resource Slavic Languages, covering Upper Sorbian (hsb) and Lower Sorbian (dsb).
It is based on Qwen2.5-3B-Instruct and adapted through continued pretraining on Sorbian monolingual and parallel data, plus general instruction-tuning datasets.

The model jointly supports machine translation (MT) and question answering (QA) for both Sorbian languages, achieving the top rank in the shared task.

📄More details in the shared task paper.

⚠️ Note: This model is research-focused and has not been tested for general usage. Use at your own risk.

Example usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "Translate the following text from German to Upper Sorbian."},
    {"role": "user", "content": "Wie lange willst du noch bleiben?"}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Shared task results

Results shared by the organizers (source).

Upper Sorbian:

DE-HSB points HSB-QA points final points
TartuNLP 86.33 4 58.10 4 8
NRC 87.20 4 29.05 1 5
SDKM 75.73 2 55.24 3 5
baseline 13.88 1 42.86 2 3

Lower Sorbian:

DE-DSB points DSB-QA points final points
TartuNLP 78.20 4 57.56 4 8
NRC 78.24 4 32.20 1 5
SDKM 64.34 2 51.71 3 5
baseline 12.21 1 45.85 2 3

Training details

  • Total training tokens: ~1.2B
  • Sequence length: 4096
  • Training hardware: LUMI supercomputer (AMD MI250x GPUs)
  • Training time: ~139 GPU-hours

Citation info

@inproceedings{purason-fishel-2025-tartunlp,
    title = "{T}artu{NLP} at {WMT}25 {LLM}s with Limited Resources for {S}lavic Languages Shared Task",
    author = "Purason, Taido  and
      Fishel, Mark",
    editor = "Haddow, Barry  and
      Kocmi, Tom  and
      Koehn, Philipp  and
      Monz, Christof",
    booktitle = "Proceedings of the Tenth Conference on Machine Translation",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.wmt-1.88/",
    doi = "10.18653/v1/2025.wmt-1.88",
    pages = "1143--1150",
    ISBN = "979-8-89176-341-8",
    abstract = "This paper describes the TartuNLP submission to the Upper Sorbian (hsb) and Lower Sorbian (dsb) tracks of the WMT25 LLMs with Limited Resources for Slavic Languages shared task, which jointly targets machine translation (MT) and question answering (QA). We develop a single multilingual model based on Qwen2.5-3B-Instruct by continuing pretraining on Sorbian monolingual and parallel data together with general instruction datasets, combining language acquisition and instruction-following in a single step. The resulting model delivers substantial improvements over the baseline Qwen2.5-3B-Instruct model and also achieves the highest ranking for both tasks in the hsb and dsb shared task tracks."
}
Downloads last month
14
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb

Base model

Qwen/Qwen2.5-3B
Finetuned
(1098)
this model

Datasets used to train tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb