Thai Handwritten OCR (TrOCR)
A Thai Handwritten OCR model fine-tuned from Microsoft TrOCR for recognizing Thai handwritten text.
Model Details
Model Description
This model is developed to convert Thai handwritten images into text using the TrOCR architecture, which combines Vision Transformer (ViT) for image processing and Transformer Decoder for text generation.
- Developed by: Warit Sirikosityanggoon
- Model type: Vision Encoder-Decoder (TrOCR)
- Language(s): Thai (th)
- License: Apache 2.0
- Finetuned from: microsoft/trocr-base-handwritten
Model Sources
- Repository: waritkan/Thai-Hand-Written-TrOCR-Webapp
Uses
Direct Use
This model can be used directly for converting Thai handwritten images into text. Suitable for:
- Converting Thai handwritten documents
- Real-time handwriting recognition systems
- Digitizing handwritten notes
Out-of-Scope Use
- Not suitable for languages other than Thai
- May not perform well on extremely difficult handwriting or low-quality images
Training Details
Training Data
Trained on iapp/thai_handwriting_dataset, which contains Thai handwritten images paired with their corresponding text labels.
Tokenizer
Uses SentencePiece with Unigram algorithm instead of Dictionary-based Word Segmentation because:
- Handles Out-of-Vocabulary words effectively
- Supports misspelled or incomplete words from handwriting
- No pre-tokenization required
Tokenizer Configuration:
- Vocab Size: 30,000
- Character Coverage: 0.9995
- Algorithm: Unigram
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 250 |
| Batch Size | 16 |
| Learning Rate | 1e-5 |
| Optimizer | AdamW |
| Training Regime | fp16 mixed precision |
Training Infrastructure
- Hardware: NVIDIA GPU (HPC Cluster)
- Framework: PyTorch + Hugging Face Transformers
Evaluation
Metrics
| Metric | Value |
|---|---|
| CER (Character Error Rate) | 0.488% |
How to Evaluate
import editdistance
def calculate_cer(pred, label):
"""Character Error Rate (lower is better)"""
if len(label) == 0:
return 1.0 if len(pred) > 0 else 0.0
distance = editdistance.eval(pred, label)
return distance / len(label)
How to Get Started with the Model
Installation
pip install transformers torch sentencepiece pillow
Usage
import torch
from PIL import Image
import sentencepiece as spm
from transformers import VisionEncoderDecoderModel, ViTImageProcessor
# Load model
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-base-handwritten')
image_processor = ViTImageProcessor.from_pretrained('microsoft/trocr-base-handwritten')
# Load Thai tokenizer
sp = spm.SentencePieceProcessor()
sp.Load('thai_sp_30000.model')
# Load trained weights
checkpoint = torch.load('best_model.pt', map_location='cpu')
model.decoder.resize_token_embeddings(sp.GetPieceSize())
model.load_state_dict(checkpoint['model_state_dict'], strict=False)
model.eval()
# Inference
image = Image.open('handwriting.jpg').convert('RGB')
pixel_values = image_processor(image, return_tensors='pt').pixel_values
with torch.no_grad():
generated_ids = model.generate(
pixel_values,
max_length=128,
num_beams=4,
)
# Decode
ids = generated_ids[0].tolist()
text = sp.DecodeIds(ids)
print(text)
Model Architecture
Input Image
|
v
Vision Transformer (ViT) Encoder
|
v
Cross-Attention
|
v
Transformer Decoder
|
v
SentencePiece Tokenizer (Unigram)
|
v
Thai Text Output
Limitations
- Performance depends on image quality and handwriting clarity
- May not perform well on handwriting styles significantly different from training data
- Supports Thai language only
Citation
@misc{thai-handwritten-trocr,
author = {Warit Sirikosityanggoon},
title = {Thai Handwritten OCR using TrOCR},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://github.com/waritkan/Thai-Hand-Written-TrOCR-Webapp}}
}
Acknowledgements
- Microsoft TrOCR for Pretrained Model
- iApp Technology for Thai Handwriting Dataset
- SentencePiece for Tokenizer
Model Card Contact
- Author: Warit Sirikosityanggoon
- GitHub: waritkan/Thai-Hand-Written-TrOCR-Webapp
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for waritkan/thai-ocr-model
Base model
microsoft/trocr-base-handwritten