Text extraction method based on wavelets

From LRDE

Abstract

Here is a method to differentiate text from non-text inside a picture using descriptors inspired by data compression algorithms. The major goal of this approch is to compute a signal that will allow a learning system (like knn or svn) to classify text and background of an image. To compute this signal a wavelet based method, similar to the one used in png or jpeg compression format, will be used. Another point that will be discussed is what kind of wavelet gives the best results and with what kind of learning system.These descriptors will then be compared whith other descriptors that are not wavelet based. Finaly the various way to reduce the time spent to compute the descriptors will be presented like the wavelet lifting.