Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms

Authors: Minh Ôn Vũ Ngoc
Place: Paris, France
Type: phdthesis
Projects: Olena
Date: 2020-02-01

Abstract

Hierarchical image representations are widely used in image processing to model the content of an image in the multi-scale structure. A well-known hierarchical representation is the tree of shapes (ToS) which encodes the inclusion relationship between connected components from different thresholded levels. This kind of tree is self-dual, contrast-change invariant and popular in computer vision community. Typically, in our work, we use this representation to compute the new distance which belongs to the mathematical morphology domain. Distance transforms and the saliency maps they induce are generally used in image processing, computer vision, and pattern recognition. One of the most commonly used distance transforms is the geodesic one. Unfortunately, this distance does not always achieve satisfying results on noisy or blurred images. Recently, a new pseudo-distancecalled the minimum barrier distance (MBD), more robust to pixel fluctuation, has been introduced. Some years afterGéraud et al. have proposed a good and fast-to-compute approximation of this distance: the Dahu pseudo-distance. Since this distance was initially developed for grayscale images, we propose here an extension of this transform to multivariate images; we call it vectorial Dahu pseudo-distance. This new distance is easily and efficiently computed thanks to the multivariate tree of shapes (MToS). We propose an efficient way to compute this distance and its deduced saliency map in this thesis. We also investigate the properties of this distance in dealing with noise and blur in the image. This distance has been proved to be robust for pixel invariant. To validate this new distance, we provide benchmarks demonstrating how the vectorial Dahu pseudo-distance is more robust and competitive compared to other MB-based distances. This distance is promising for salient object detectionshortest path finding, and object segmentation. Moreoverwe apply this distance to detect the document in videos. Our method is a region-based approach which relies on visual saliency deduced from the Dahu. We show that the performance of our method is competitive with state-of-the-art methods on the ICDAR Smartdoc 2015 Competition dataset.

Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms

From LRDE

Abstract