A precise skew estimation algorithm for document images using KNN clustering and Fourier transform

From LRDE

Abstract

In this article, we propose a simple and precise skew estimation algorithm for binarized document images. The estimation is performed in the frequency domain. To get a precise result, the Fourier transform is not applied to the document itself but the document is preprocessed: all regions of the document are clustered using a KNN and contours of grouped regions are smoothed using the convex hull to form more regular shapes, with better orientation. No assumption has been made concerning the nature or the content of the document. This method has been shown to be very accurate and was ranked first at the DISEC'13 contestduring the ICDAR competitions.

Documents


Bibtex (lrde.bib)

@InProceedings{	  fabrizio.14.icip,
  author	= {Jonathan Fabrizio},
  title		= {A precise skew estimation algorithm for document images
		  using {KNN} clustering and Fourier transform},
  booktitle	= {Proceedings of the 21st International Conference on Image
		  Processing (ICIP)},
  year		= 2014,
  address	= {Paris, France},
  pages		= {2585--2588},
  abstract	= {In this article, we propose a simple and precise skew
		  estimation algorithm for binarized document images. The
		  estimation is performed in the frequency domain. To get a
		  precise result, the Fourier transform is not applied to the
		  document itself but the document is preprocessed: all
		  regions of the document are clustered using a KNN and
		  contours of grouped regions are smoothed using the convex
		  hull to form more regular shapes, with better orientation.
		  No assumption has been made concerning the nature or the
		  content of the document. This method has been shown to be
		  very accurate and was ranked first at the DISEC'13 contest,
		  during the ICDAR competitions.}
}