Machine Learning Methods for Automatically Processing Historical Documents: From Paper Acquisition to XML Transformation

Authors:
F. Esposito;D. Malerba;G. Semeraro;S. Ferilli;O. Altamura;T. M. A. Basile;M. Berardi;M. Ceci;N. Di Mauro
Affiliations:
-;-;-;-;-;-;-;-;-
Venue:
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Year:
2004

Citing 0
Cited 4

An Adaptative Recognition System Using a Table Description Language for Hierarchical Table Structures in Archival Documents

Graphics Recognition. Recent Advances and New Opportunities
Sample-based collection and adjustment algorithm for metadata extraction parameter of flexible format document

ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
A minimal and sufficient way of introducing external knowledge for table recognition in archival documents

GREC'05 Proceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives
Beyond digital incunabula: modeling the next generation of digital libraries

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the aims of the EU project COLLATE is to design and implement a Web-based collaboratory for archives, scientists and end-users working with digitized cultural material. Since the originals of such a material are often unique and scattered in various archives, severe problems arise for their wide fruition. A solution would be to develop intelligent document processing tools that automatically transform printed documents into a web-accessible form such as XML. Here, we propose the use of a document processing system, WISDOM++, which uses heavily machine learning techniques in order to perform such a task, and report promising results obtained in preliminary experiments.