Dematerialization Tools in SCRIBO
- Julien Marquegnies
SCRIBO, for Semi-automatic and Collaborative Retrieval of Information Based on Ontologies, is a document image analysis and semi-automatic analysis project aiming to establish algorithms and collaborative tools in order to extract knowledge from texts and images. The extraction of the different structures of a digitalized document is based on the setup of a processing chain composed of crucial steps to optimize the quality of the rendering. The deskewing of images, prior to the processing chain, is a necessary step that corrects a possible angle due to the digitization of the document. Moreover, the extraction and the study of the characters composing the text, like the average color, the average boldness or the skeleton, not only enables a reconstitution of the text as accurate as possible, but also prepares this one for the OCR. Therebywe will first introduce an algorithm providing a quick detection of the skew angle of a document for little angles, and then the study conducted to extrct different character features.