---
datasets:
- hal-utokyo/Manga109-s
- NilanE/ParallelFiction-Ja_En-100k
language:
- ja
metrics:
- accuracy
- cer
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- OCR
- Manga
- 10m-parameters
license: apache-2.0
---

## Manga OCR Mobile (Preview)

This model is a lightweight OCR model build for speed and optimized for mobile/edge devices.

It achives high-accuracy text recognition while maintaining a footprint much smaller than standard models.

Check out the [technical docs](https://bluolightning.github.io/manga-ocr-mobile) for more details. Source code will soon be available at the [GitHub repo](https://github.com/bluolightning/manga-ocr-mobile)

# Training Details

- Pretrained on ~1 million synthetic images generated with cleaned/filtered text:
  - 60% anime (the corpus is not public)
  - 20% webnovel
  - 20% CC100
- Fine-tuned on Manga109s dataset (random 90% split)
- Trained in PyTorch and converted to TFLite with [AI Edge Torch](https://github.com/google-ai-edge/ai-edge-torch)
- Achieves ~7.4% CER (character error rate) and ~73% exact-match accuracy on a random 10% split of Manga109s
  - Comparable to `PaddleOCR-VL-For-Manga`, which has a ~10% CER and ~70% exact-match accuracy
  - The model seems to struggle with English letters and punctuation

# Acknowledgments

This project was done with the usage of:
- [Manga109-s](http://www.manga109.org/en/download_s.html) dataset
- [CC-100](https://data.statmt.org/cc-100/) dataset - used for synthetic data
- [webnovels](https://huggingface.co/datasets/NilanE/ParallelFiction-Ja_En-100k) dataset - used for synthetic data

The model builds upon [kha-white/manga-ocr](https://github.com/kha-white/manga-ocr), with a significant divergence in deployment focus and data generation.

```BibTeX
@inproceedings{wang2024repvit,
  title={Repvit: Revisiting mobile cnn from vit perspective},
  author={Wang, Ao and Chen, Hui and Lin, Zijia and Han, Jungong and Ding, Guiguang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15909--15920},
  year={2024}
}

@misc{wang2023repvitsam,
      title={RepViT-SAM: Towards Real-Time Segmenting Anything}, 
      author={Ao Wang and Hui Chen and Zijia Lin and Jungong Han and Guiguang Ding},
      year={2023},
      eprint={2312.05760},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```