カンファレンス (国際) Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition
Ryota Yoshihashi, Tomohiro Tanaka, Kenji Doi, Takumi Fujino, Naoaki Yamashita
16th International Conference on Document Analysis and Recognition (ICDAR 2021)
In deploying scene-text spotting systems on mobile platforms such as smartphones, light-weighted models with low computation are preferable. In concept, end-to-end (e2e) text spotting methods are suitable for such purposes because they perform text detection and recognition in single models. However, current state-of-the-art e2e methods rely on heavy feature extractors, recurrent sequence modelings, and/or complex shape aligners to pursue accuracy on benchmarks, that makes their computations still heavy and their inference speeds are slower than real time. We explore the opposite direction: How far we can go without bells and whistles in e2e text recognition? Following the idea, we propose a text-spotting method that consists of simple convolutions and a few post processings, named Context-Free TextSpotter. Experiments in standard benchmarks show that Context-Free TextSpotter achieves real-time e2e text spotting on a GPU with only three million parameters, which is the smallest and fastest among existing deep text spotters, with an acceptable transcription quality degradation to heavy state-of-the-art ones. Further, we demonstrate that our text spotter runs on a smartphone with affordable latency, that is valuable for building stand-alone OCR applications.
Paper (arXiv.org, external link)