--- datasets: - hal-utokyo/Manga109-s - NilanE/ParallelFiction-Ja_En-100k language: - ja metrics: - accuracy - cer pipeline_tag: image-text-to-text library_name: transformers tags: - OCR - Manga - 10m-parameters license: apache-2.0 --- ## Manga OCR Mobile (Preview) This model is a lightweight OCR model build for speed and optimized for mobile/edge devices. It achives high-accuracy text recognition while maintaining a footprint much smaller than standard models. Check out the [technical docs](https://bluolightning.github.io/manga-ocr-mobile) for more details. Source code will soon be available at the [GitHub repo](https://github.com/bluolightning/manga-ocr-mobile) # Training Details - Pretrained on ~1 million synthetic images generated with cleaned/filtered text: - 60% anime (the corpus is not public) - 20% webnovel - 20% CC100 - Fine-tuned on Manga109s dataset (random 90% split) - Trained in PyTorch and converted to TFLite with [AI Edge Torch](https://github.com/google-ai-edge/ai-edge-torch) - Achieves ~7.4% CER (character error rate) and ~73% exact-match accuracy on a random 10% split of Manga109s - Comparable to `PaddleOCR-VL-For-Manga`, which has a ~10% CER and ~70% exact-match accuracy - The model seems to struggle with English letters and punctuation # Acknowledgments This project was done with the usage of: - [Manga109-s](http://www.manga109.org/en/download_s.html) dataset - [CC-100](https://data.statmt.org/cc-100/) dataset - used for synthetic data - [webnovels](https://huggingface.co/datasets/NilanE/ParallelFiction-Ja_En-100k) dataset - used for synthetic data The model builds upon [kha-white/manga-ocr](https://github.com/kha-white/manga-ocr), with a significant divergence in deployment focus and data generation. ```BibTeX @inproceedings{wang2024repvit, title={Repvit: Revisiting mobile cnn from vit perspective}, author={Wang, Ao and Chen, Hui and Lin, Zijia and Han, Jungong and Ding, Guiguang}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={15909--15920}, year={2024} } @misc{wang2023repvitsam, title={RepViT-SAM: Towards Real-Time Segmenting Anything}, author={Ao Wang and Hui Chen and Zijia Lin and Jungong Han and Guiguang Ding}, year={2023}, eprint={2312.05760}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```