Text detection in street level image

From LRDE

Abstract

Text detection system for natural images is a very challenging task in Computer Vision. Image acquisition introduces distortion in terms of perspective, blurringillumination, and characters which may have very different shape, size, and color. We introduce in this article a full text detection scheme. Our architecture is based on a new process to combine a hypothesis generation step to get potential boxes of text and a hypothesis validation step to filter false detections. The hypothesis generation process relies on a new efficient segmentation method based on a morphological operator. Regions are then filtered and classified using shape descriptors based on Fourier, Pseudo Zernike moments and an original polar descriptor, which is invariant to rotation. Classification process relies on three SVM classifiers combined in a late fusion scheme. Detected characters are finally grouped to generate our text box hypotheses. Validation step is based on a global SVM classification of the box content using dedicated descriptors adapted from the HOG approach. Results on the well-known ICDAR database are reported showing that our method is competitive. Evaluation protocol and metrics are deeply discussed and results on a very challenging street-level database are also proposed.

Documents

Bibtex (lrde.bib)

@Article{	  fabrizio.13.paa,
  author	= {Jonathan Fabrizio and Beatriz Marcotegui and Matthieu
		  Cord},
  title		= {Text detection in street level image},
  journal	= {Pattern Analysis and Applications},
  year		= 2013,
  volume	= 16,
  number	= 4,
  month		= nov,
  publisher	= {Springer},
  pages		= {519--533},
  abstract	= {Text detection system for natural images is a very
		  challenging task in Computer Vision. Image acquisition
		  introduces distortion in terms of perspective, blurring,
		  illumination, and characters which may have very different
		  shape, size, and color. We introduce in this article a full
		  text detection scheme. Our architecture is based on a new
		  process to combine a hypothesis generation step to get
		  potential boxes of text and a hypothesis validation step to
		  filter false detections. The hypothesis generation process
		  relies on a new efficient segmentation method based on a
		  morphological operator. Regions are then filtered and
		  classified using shape descriptors based on Fourier, Pseudo
		  Zernike moments and an original polar descriptor, which is
		  invariant to rotation. Classification process relies on
		  three SVM classifiers combined in a late fusion scheme.
		  Detected characters are finally grouped to generate our
		  text box hypotheses. Validation step is based on a global
		  SVM classification of the box content using dedicated
		  descriptors adapted from the HOG approach. Results on the
		  well-known ICDAR database are reported showing that our
		  method is competitive. Evaluation protocol and metrics are
		  deeply discussed and results on a very challenging
		  street-level database are also proposed.}
}