Clustering and classification of document structure-a machine learning approach

Authors:
A. Dengel;F. Dubiel
Affiliations:
-;-
Venue:
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Year:
1995

Citing 0
Cited 10

The “growing up” of HyperBraille—an office workspace for blind people

Proceedings of the 9th annual ACM symposium on User interface software and technology
An Optimization Methodology for Document Structure Extraction on Latin Character Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
Adaptive Layout Analysis of Document Images

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Page Classification for Meta-data Extraction from Digital Collections

DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
Hidden Tree Markov Models for Document Image Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Correcting the Document Layout: A Machine Learning Approach

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Capturing the Layout of Electronic Documents for Reuse in Variable Data Printing

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A Knowledge Management System Using Bayesian Networks

AI*IA '09: Proceedings of the XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia on Emergent Perspectives in Artificial Intelligence
Collective classification for spam filtering

CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
The construction complexity of orgraphs: Some mathematical models and their applications

Automatic Documentation and Mathematical Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a system which is capable of learning the presentation of document logical structures, exemplarily shown for business letters. Presenting a set of instances to the system, it clusters them into structural concepts and induces a concept hierarchy. This concept hierarchy is taken as a source for classifying future input. The paper introduces the different learning steps, describes how the resulting concept hierarchy is applied for logical labeling and reports on the results.