RELATIONAL DATA MINING AND ILP FOR DOCUMENT IMAGE UNDERSTANDING

Authors:
Michelangelo Ceci;Margherita Berardi;Donato Malerba
Affiliations:
Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy;Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy;Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy
Venue:
Applied Artificial Intelligence
Year:
2007

Citing 22
Cited 3

Foundations of logic programming; (2nd extended ed.)

Foundations of logic programming; (2nd extended ed.)
A Prototype Document Image Analysis System for Technical Journals

Computer
From Paper to Office Document Standard Representation

Computer
Interactive theory revision: an inductive logic programming approach

Interactive theory revision: an inductive logic programming approach
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Twenty Years of Document Image Analysis in PAMI

IEEE Transactions on Pattern Analysis and Machine Intelligence
Maintaining knowledge about temporal intervals

Communications of the ACM
Robust Classification for Imprecise Environments

Machine Learning
Machine Learning

Machine Learning
Foundations of Inductive Logic Programming

Foundations of Inductive Logic Programming
Relational Data Mining

Relational Data Mining
Document Processing for Automatic Knowledge Acquisition

IEEE Transactions on Knowledge and Data Engineering
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Multi-relational Decision Tree Induction

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Automatic Knowledge Acquisition for Spatial Document Interpretation

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
WISDOM++: An Interactive and Adaptive Document Analysis System

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Construction of generic models of document structures using inference of tree grammars

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Spatial associative classification: propositional vs structural approach

Journal of Intelligent Information Systems
Learning Recursive Theories in the Normal ILP Setting

Fundamenta Informaticae
Using colour information to understand censorship cards of film archives

International Journal on Document Analysis and Recognition
Discriminative probabilistic models for relational data

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

STATE OF APPLICATIONS IN AI RESEARCHES FROM AI*IA 2005

Applied Artificial Intelligence
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document image understanding denotes the recognition of semantically relevant components in the layout extracted from a document image. This recognition process is based on domain-specific knowledge that can be acquired automatically by applying data mining techniques. The spatial dimension of page layout makes classification methods developed in inductive logic programming (ILP) and multi-relational data mining (MRDM) the most suitable candidates for this specific task. In this paper, both approaches are considered and empirically compared on three different data sets consisting of multi-page articles published in an international journal and historical documents. The ILP method is able to learn recursive logical theories that express dependencies between logical components, while the MRDM method extends the naïve Bayesian classifier to data stored in multiple tables of a relational database. Experimental results confirm the importance of the spatial dimension for this application and show that the ILP method tends to be conservative with a high (low) percentage of omission (commission) errors, while the probabilistic nature of the MRDM method allows us to tradeoff between the two types of error.