Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms

From LRDE

Abstract

The objective of this thesis is twofold. On one hand it targets the proposition of a more accurate evaluation protocol designed for text detection systems that solves some of the existing problems in this area. On the other hand, it focuses on the design of a text rectification procedure used for the correction of highly deformed texts. Text detection systems have gained a significant importance during the last years. The growing number of approaches proposed in the literature requires a rigorous performance evaluation and ranking. In the context of text detectionan evaluation protocol relies on three elements: a reliable text reference, a matching set of rules deciding the relationship between the ground truth and the detections and finally a set of metrics that produce intuitive scores. The few existing evaluation protocols often lack accuracy either due to inconsistent matching procedures that provide unfair scores or due to unrepresentative metrics. Despite these issues, until today, researchers continue to use these protocols to evaluate their work. In this Ph.D thesis we propose a new evaluation protocol for text detection algorithms that tackles most of the drawbacks faced by currently used evaluation methods. This work is focused on three main contributions: firstly, we introduce a complex text reference representation that does not constrain text detectors to adopt a specific detection granularity level or annotation representation; secondly, we propose a set of matching rules capable of evaluating any type of scenario that can occur between a text reference and a detection; and finally we show how we can analyze a set of detection results, not only through a set of metrics, but also through an intuitive visual representation. We use this protocol to evaluate different text detectors and then compare the results with those provided by alternative evaluation methods. A frequent challenge for many Text Understanding Systems is to tackle the variety of text characteristics in born-digital and natural scene images to which current OCRs are not well adapted. For example, texts in perspective are frequently present in real-word images because the camera capture angle is not normal to the plane containing text regions. Despite the ability of some detectors to accurately localize such text objects, the recognition stage fails most of the time. Indeed, most OCRs are not designed to handle text strings in perspective but rather expect horizontal texts in a parallel-frontal plane to provide a correct transcription. All these aspectstogether with the proposition of a very challenging dataset, motivated us to propose a rectification procedure capable of correcting highly distorted texts.

Documents

Bibtex (lrde.bib)

@PhDThesis{	  calarasanu.15.phd,
  author	= {Stefania Calarasanu},
  title		= {Improvement of a text detection chain and the proposition
		  of a new evaluation protocol for text detection
		  algorithms},
  school	= {Universit\'e Pierre et Marie Curie - Paris 6},
  year		= 2015,
  address	= {Paris, France},
  month		= dec,
  abstract	= {The objective of this thesis is twofold. On one hand it
		  targets the proposition of a more accurate evaluation
		  protocol designed for text detection systems that solves
		  some of the existing problems in this area. On the other
		  hand, it focuses on the design of a text rectification
		  procedure used for the correction of highly deformed texts.
		  Text detection systems have gained a significant importance
		  during the last years. The growing number of approaches
		  proposed in the literature requires a rigorous performance
		  evaluation and ranking. In the context of text detection,
		  an evaluation protocol relies on three elements: a reliable
		  text reference, a matching set of rules deciding the
		  relationship between the ground truth and the detections
		  and finally a set of metrics that produce intuitive scores.
		  The few existing evaluation protocols often lack accuracy
		  either due to inconsistent matching procedures that provide
		  unfair scores or due to unrepresentative metrics. Despite
		  these issues, until today, researchers continue to use
		  these protocols to evaluate their work. In this Ph.D thesis
		  we propose a new evaluation protocol for text detection
		  algorithms that tackles most of the drawbacks faced by
		  currently used evaluation methods. This work is focused on
		  three main contributions: firstly, we introduce a complex
		  text reference representation that does not constrain text
		  detectors to adopt a specific detection granularity level
		  or annotation representation; secondly, we propose a set of
		  matching rules capable of evaluating any type of scenario
		  that can occur between a text reference and a detection;
		  and finally we show how we can analyze a set of detection
		  results, not only through a set of metrics, but also
		  through an intuitive visual representation. We use this
		  protocol to evaluate different text detectors and then
		  compare the results with those provided by alternative
		  evaluation methods. A frequent challenge for many Text
		  Understanding Systems is to tackle the variety of text
		  characteristics in born-digital and natural scene images to
		  which current OCRs are not well adapted. For example, texts
		  in perspective are frequently present in real-word images
		  because the camera capture angle is not normal to the plane
		  containing text regions. Despite the ability of some
		  detectors to accurately localize such text objects, the
		  recognition stage fails most of the time. Indeed, most OCRs
		  are not designed to handle text strings in perspective but
		  rather expect horizontal texts in a parallel-frontal plane
		  to provide a correct transcription. All these aspects,
		  together with the proposition of a very challenging
		  dataset, motivated us to propose a rectification procedure
		  capable of correcting highly distorted texts.}
}