Improving OCR k-NN classifier's training set



One part of an OCR toolchain is to classify detected characters: they can be lowercase or capital letters, or digits. To do so, our OCR computes for each image of character an associated wavelet-based descriptor. This descriptor can then be classified. The classification step is currently based on a multiclass k-NN classifier. Since the testing step heavily depends on the number of samples of the training set, the latter can be modified to improve the scores. Our work is focused on the possible improvements of the training set.