High-level Document Structure Analysis in the Olena Scribo Module

From LRDE

Abstract

The structure extraction of a digitized document is based on the setup of a processing chain composed of elementary algorithms. The high-level document structure analysis will be based on this same processing chain but will consider other abstract information on the structure composition of the document, to get more clues on its structure. From those clues and from the structure data of the document, it is then possible to perform high-level analysis like identifying the reading direction, extract some specific elements or convert the document into another format.