Text Recognition in Natural Images

Reading text from natural images is a challenging problem that has received significant attention in recent years. Traditional systems in this area have generally relied on elaborate models incorporating carefully hand-engineered features or large amounts of prior knowledge. In this project, we take a different approach and instead, leverage the power of unsupervised feature learning in conjunction with deep, multi-layer neural networks in order to develop robust, high-performing modules for text recognition in natural images.

Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning

Adam Coates, Blake Carpenter, Carl Case, Sanjeev Satheesh, Bipin Suresh, Tao Wang, David J. Wu, and Andrew Y. Ng

Abstract:

Reading text from photographs is a challenging problem that has received a significant amount of attention. Two key components of most systems are (i) text detection from images and (ii) character recognition, and many recent methods have been proposed to design better feature representations and models for both. In this paper, we apply methods recently developed in machine learning – specifically, large-scale algorithms for learning the features automatically from unlabeled data – and show that they allow us to construct highly effective classifiers for both detection and recognition to be used in a high accuracy end-to-end system.

Resources:

BibTeX:
@inproceedings{CCCSSWWN11,
  author    = {Adam Coates and Blake Carpenter and Carl Case and
               Sanjeev Satheesh and Bipin Suresh and Tao Wang and
               David J. Wu and Andrew Y. Ng},
  title     = {Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning},
  booktitle = {International Conference on Document Analysis and Recognition ({ICDAR})},
  year      = {2011}
}

End-to-End Text Recognition with Convolutional Neural Networks

Tao Wang, David J. Wu, Adam Coates, and Andrew Y. Ng

Abstract:

Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully hand-engineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to train highly-accurate text detector and character recognizer modules. Then, using only simple off- the-shelf methods, we integrate these two modules into a full end-to-end, lexicon-driven, scene text recognition system that achieves state-of-the-art performance on standard benchmarks, namely Street View Text and ICDAR 2003.

Resources:

BibTeX:
@inproceedings{WWCN12,
  author    = {Tao Wang and David J. Wu and Adam Coates and Andrew Y. Ng},
  title     = {End-to-end Text Recognition with Convolutional Neural Networks},
  booktitle = {International Conference on Pattern Recognition ({ICPR})},
  year      = {2012}
}