Logical Labeling of Arabic Newspapers using Artificial Neural Nets
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A Bayesian network approach to semantic labelling of text formatting in XML corpora of documents
UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: applications and services
XCDF: a canonical and structured document format
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
Abstract: This paper discusses logical labeling in documents, which is one basic step in logical structure recognition. Logical labels have to be attributed to text blocks composing the layout structure. Our study is based on physical characteristics having a visual aspect: typographic, geometric and/or topologic attributes. Our objective is to map a low level logical structure, which consists of a set of logical labels, on the extracted layout structure components. We have to build a model that allows this mapping. How ever, the documents we consider have various layout and logical structures, thus, we chose to perform this task by supervised learning on the basis of a set of training documents. This allows us to define a generic method to solve this problem, without imposing any constraint on document structure. We propose a probabilistic model represented by a Bayesian Network (BN), which is a graphical model used in our problem as a classifier. A prototype has been implemented, and applied to tables of contents in periodics.