Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms

From LRDE

Abstract

Hierarchical image representations are widely used in image processing to model the content of an image in the multi-scale structure. A well-known hierarchical representation is the tree of shapes (ToS) which encodes the inclusion relationship between connected components from different thresholded levels. This kind of tree is self-dual, contrast-change invariant and popular in computer vision community. Typically, in our work, we use this representation to compute the new distance which belongs to the mathematical morphology domain. Distance transforms and the saliency maps they induce are generally used in image processing, computer vision, and pattern recognition. One of the most commonly used distance transforms is the geodesic one. Unfortunately, this distance does not always achieve satisfying results on noisy or blurred images. Recently, a new pseudo-distancecalled the minimum barrier distance (MBD), more robust to pixel fluctuation, has been introduced. Some years afterGéraud et al. have proposed a good and fast-to-compute approximation of this distance: the Dahu pseudo-distance. Since this distance was initially developed for grayscale images, we propose here an extension of this transform to multivariate images; we call it vectorial Dahu pseudo-distance. This new distance is easily and efficiently computed thanks to the multivariate tree of shapes (MToS). We propose an efficient way to compute this distance and its deduced saliency map in this thesis. We also investigate the properties of this distance in dealing with noise and blur in the image. This distance has been proved to be robust for pixel invariant. To validate this new distance, we provide benchmarks demonstrating how the vectorial Dahu pseudo-distance is more robust and competitive compared to other MB-based distances. This distance is promising for salient object detectionshortest path finding, and object segmentation. Moreoverwe apply this distance to detect the document in videos. Our method is a region-based approach which relies on visual saliency deduced from the Dahu pseudo-distance. We show that the performance of our method is competitive with state-of-the-art methods on the ICDAR Smartdoc 2015 Competition dataset.

Documents

Bibtex (lrde.bib)

@PhDThesis{	  movn.20.phd,
  author	= {Minh {\^On V\~{u} Ng\d{o}c}},
  title		= {Improvement of a text detection chain and the proposition
		  of a new evaluation protocol for text detection
		  algorithms},
  school	= {Sorbonne Universit\'e},
  year		= 2020,
  address	= {Paris, France},
  month		= feb,
  abstract	= {Hierarchical image representations are widely used in
		  image processing to model the content of an image in the
		  multi-scale structure. A well-known hierarchical
		  representation is the tree of shapes (ToS) which encodes
		  the inclusion relationship between connected components
		  from different thresholded levels. This kind of tree is
		  self-dual, contrast-change invariant and popular in
		  computer vision community. Typically, in our work, we use
		  this representation to compute the new distance which
		  belongs to the mathematical morphology domain. Distance
		  transforms and the saliency maps they induce are generally
		  used in image processing, computer vision, and pattern
		  recognition. One of the most commonly used distance
		  transforms is the geodesic one. Unfortunately, this
		  distance does not always achieve satisfying results on
		  noisy or blurred images. Recently, a new pseudo-distance,
		  called the minimum barrier distance (MBD), more robust to
		  pixel fluctuation, has been introduced. Some years after,
		  G\'{e}raud et al. have proposed a good and fast-to-compute
		  approximation of this distance: the Dahu pseudo-distance.
		  Since this distance was initially developed for grayscale
		  images, we propose here an extension of this transform to
		  multivariate images; we call it vectorial Dahu
		  pseudo-distance. This new distance is easily and
		  efficiently computed thanks to the multivariate tree of
		  shapes (MToS). We propose an efficient way to compute this
		  distance and its deduced saliency map in this thesis. We
		  also investigate the properties of this distance in dealing
		  with noise and blur in the image. This distance has been
		  proved to be robust for pixel invariant. To validate this
		  new distance, we provide benchmarks demonstrating how the
		  vectorial Dahu pseudo-distance is more robust and
		  competitive compared to other MB-based distances. This
		  distance is promising for salient object detection,
		  shortest path finding, and object segmentation. Moreover,
		  we apply this distance to detect the document in videos.
		  Our method is a region-based approach which relies on
		  visual saliency deduced from the Dahu pseudo-distance. We
		  show that the performance of our method is competitive with
		  state-of-the-art methods on the ICDAR Smartdoc 2015
		  Competition dataset. }
}