Improving OCR k-NN classifier's training set

From LRDE

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Abstract

One part of an OCR toolchain is to classify detected characters: they can be lowercase or capital letters, or digits. To do so, our OCR computes for each image of character an associated wavelet-based descriptor. This descriptor can then be classified. The classification step is currently based on a multiclass k-NN classifier. Since the testing step heavily depends on the number of samples of the training set, the latter can be modified to improve the scores. Our work is focused on the possible improvements of the training set.