A first step toward a fair comparison of evaluation protocols for text detection algorithms

From LRDE

Abstract

Text detection is an important topic in pattern recognition, but evaluating the reliability of such detection algorithms is challenging. While many evaluation protocols have been developed for that purpose, they often show dissimilar behaviors when applied in the same context. As a consequence, their usage may lead to misinterpretations, potentially yielding erroneous comparisons between detection algorithms or their incorrect parameters tuning. This paper is a first attempt to derive a methodology to perform the comparison of evaluation protocols. We then apply it on five state-of-the-art protocols, and exhibit that there indeed exist inconsistencies among their evaluation criteria. Our aim here is not to rank the investigated evaluation protocols, but rather raising awareness in the community that we should carefully reconsider them in order to converge to their optimal usage.

Documents

Bibtex (lrde.bib)

@InProceedings{	  dangla.18.das,
  author	= {Aliona Dangla and {\'E}lodie Puybareau and Guillaume
		  Tochon and Jonathan Fabrizio},
  title		= {A first step toward a fair comparison of evaluation
		  protocols for text detection algorithms},
  booktitle	= {Proceedings of the IAPR International Workshop on Document
		  Analysis Systems (DAS)},
  year		= {2018},
  month		= apr,
  address	= {Vienna, Austria},
  abstract	= {Text detection is an important topic in pattern
		  recognition, but evaluating the reliability of such
		  detection algorithms is challenging. While many evaluation
		  protocols have been developed for that purpose, they often
		  show dissimilar behaviors when applied in the same context.
		  As a consequence, their usage may lead to
		  misinterpretations, potentially yielding erroneous
		  comparisons between detection algorithms or their incorrect
		  parameters tuning. This paper is a first attempt to derive
		  a methodology to perform the comparison of evaluation
		  protocols. We then apply it on five state-of-the-art
		  protocols, and exhibit that there indeed exist
		  inconsistencies among their evaluation criteria. Our aim
		  here is not to rank the investigated evaluation protocols,
		  but rather raising awareness in the community that we
		  should carefully reconsider them in order to converge to
		  their optimal usage.}
}