Qwen2.5-1.5B-Singlish-Transliteration

Model Summary

Qwen2.5-1.5B-Singlish-Transliteration is a fine-tuned version of the Qwen/Qwen2.5-1.5B-Instruct large language model, specialized for transliterating Singlish (phonetic Sinhala typed in English) into Sinhala script.

This model was developed to bridge the gap between informal Romanized typing and formal Sinhala script, particularly for social media content, chat logs, and digital communication in Sri Lanka.

  • Developed by: Afeef Zeed
  • Task: Singlish to Sinhala Transliteration
  • Base Model: Qwen 2.5 (1.5B Parameters)
  • Fine-Tuning Technique: LoRA (Low-Rank Adaptation) via PEFT
  • Dataset Size: ~500,000 pairs of Singlish-Sinhala text

Uses

Direct Use

The model is designed to take a Singlish sentence as input and output the corresponding Sinhala text. It understands context better than rule-based transliterators.

Example:

  • Input: "oyage nama mokakda"
  • Output: "ඔයාගේ නම මොකක්ද"

How to Get Started with the Model

You can use this model directly with the Hugging Face transformers library.

Python Code

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# 1. Load Base Model
base_model_name = "Qwen/Qwen2.5-1.5B-Instruct"
adapter_model_name = "Afeefzeed/Qwen2.5-Singlish-Transliteration" # Replace with your actual username if different

print("Loading model...")
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# 2. Load the Fine-Tuned Adapter
model = PeftModel.from_pretrained(base_model, adapter_model_name)
model.eval()

# 3. Define Transliteration Function
def transliterate(text):
    prompt = f"<|im_start|>user\nTransliterate this Singlish text to Sinhala: {text}<|im_end|>\n<|im_start|>assistant\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs, 
            max_new_tokens=100, 
            temperature=0.1, 
            do_sample=True
        )
    
    # Decode and extract only the assistant's response
    full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return full_output.split("assistant\n")[-1].strip()

# 4. Test
input_text = "mama heta gedara yanawa"
print(f"Input: {input_text}")
print(f"Output: {transliterate(input_text)}")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Afeefzeed/Qwen2.5-Singlish-Transliteration

Adapter
(830)
this model