Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition - Yahoo! JAPAN R&D

Publications

CONFERENCE (INTERNATIONAL) Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition

Ryota Yoshihashi, Tomohiro Tanaka, Kenji Doi, Takumi Fujino, Naoaki Yamashita

16th International Conference on Document Analysis and Recognition (ICDAR 2021)

September 09, 2021

In deploying scene-text spotting systems on mobile platforms such as smartphones, light-weighted models with low computation are preferable. In concept, end-to-end (e2e) text spotting methods are suitable for such purposes because they perform text detection and recognition in single models. However, current state-of-the-art e2e methods rely on heavy feature extractors, recurrent sequence modelings, and/or complex shape aligners to pursue accuracy on benchmarks, that makes their computations still heavy and their inference speeds are slower than real time. We explore the opposite direction: How far we can go without bells and whistles in e2e text recognition? Following the idea, we propose a text-spotting method that consists of simple convolutions and a few post processings, named Context-Free TextSpotter. Experiments in standard benchmarks show that Context-Free TextSpotter achieves real-time e2e text spotting on a GPU with only three million parameters, which is the smallest and fastest among existing deep text spotters, with an acceptable transcription quality degradation to heavy state-of-the-art ones. Further, we demonstrate that our text spotter runs on a smartphone with affordable latency, that is valuable for building stand-alone OCR applications.

Paper (arXiv.org, external link)

Image Processing

Paper : Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition (external link)

PDF : Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition