Machine Learning for Intelligent Processing of Printed Documents

Authors:
Floriana Esposito;Donato Malerba;Francesca A. Lisi
Affiliations:
Dipartimento di Informatica, Università degli Studi di Bari, via Orabona 4, 70125 Bari, Italy. esposito@di.uniba.it;Dipartimento di Informatica, Università degli Studi di Bari, via Orabona 4, 70125 Bari, Italy. malerba@di.uniba.it;Dipartimento di Informatica, Università degli Studi di Bari, via Orabona 4, 70125 Bari, Italy. lisi@di.uniba.it
Venue:
Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
Year:
2000

Citing 18
Cited 16

Generating and generalizing models of visual objects

Artificial Intelligence
Constructive learning with continuous-values attributes

Proceedings of the 2nd International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems on Uncertainty and intelligent systems
Classification of newspaper image blocks using texture analysis

Computer Vision, Graphics, and Image Processing
Empirical learning methods for digitized document recognition: an integrated approach to inductive generalization

Proceedings of the sixth conference on Artificial intelligence applications
Letter pattern recognition

Proceedings of the sixth conference on Artificial intelligence applications
Rigel: An Inductive Learning System

Machine Learning
On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
A Prototype Document Image Analysis System for Technical Journals

Computer
C4.5: programs for machine learning

C4.5: programs for machine learning
From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
Machine Learning

Machine Learning
Inductive Logic Programming: Techniques and Applications

Inductive Logic Programming: Techniques and Applications
Document Processing for Automatic Knowledge Acquisition

IEEE Transactions on Knowledge and Data Engineering
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Learning Quantitative Features in a Symbolic Environment

ISMIS '91 Proceedings of the 6th International Symposium on Methodologies for Intelligent Systems
Handling Continuous Data in Top-Down Induction of First-Order Rules

AI*IA '97 Proceedings of the 5th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Processing Paper Documents with WISDOM

AI*IA '97 Proceedings of the 5th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
WISDOM++: An Interactive and Adaptive Document Analysis System

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition

Multistrategy Learning of Rules for Automated Classification of Cultural Heritage Material

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Adaptive Layout Analysis of Document Images

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Document Classification and Interpretation through the Inference of Logic-Based Models

ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries
Induction of Recursive Theories in the Normal ILP Setting: Issues and Solutions

ILP '00 Proceedings of the 10th International Conference on Inductive Logic Programming
Extraction, layout analysis and classification of diagrams in PDF documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Thick 2D relations for document understanding

Information Sciences—Informatics and Computer Science: An International Journal
Intelligent Document Processing

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Learning Recursive Theories in the Normal ILP Setting

Fundamenta Informaticae
Inference of abduction theories for handling incompleteness in first-order learning

Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
An algorithm based on counterfactuals for concept learning in the Semantic Web

Applied Intelligence
Inductive learning from numerical and symbolic data: An integrated framework

Intelligent Data Analysis
Decomposing document images by heuristic search

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
An overview of AI research in Italy

Artificial intelligence
Rule based document understanding of historical books using a hybrid fuzzy classification system

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Learning Recursive Theories in the Normal ILP Setting

Fundamenta Informaticae
Symbolic machine learning methods for historical document processing

Proceedings of the 2013 ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A paper document processing system is an information systemcomponent which transforms information on printed or handwrittendocuments into a computer-revisable form. In intelligent systems forpaper document processing this information capture process is basedon knowledge of the specific layout and logical structures of thedocuments. This article proposes the application of machine learningtechniques to acquire the specific knowledge required by anintelligent document processing system, named WISDOM++, that managesprinted documents, such as letters and journals. Knowledge isrepresented by means of decision trees and first-order rulesautomatically generated from a set of training documents. Inparticular, an incremental decision tree learning system is appliedfor the acquisition of decision trees used for the classification ofsegmented blocks, while a first-order learning system is applied forthe induction of rules used for the layout-based classification andunderstanding of documents. Issues concerning the incrementalinduction of decision trees and the handling of both numeric andsymbolic data in first-order rule learning are discussed, and thevalidity of the proposed solutions is empirically evaluated byprocessing a set of real printed documents.