Text Extraction from Color Documents - Clustering Approaches in Three and Four Dimensions
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Document Image Analysis for World War II Personal Records
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Color segmentation for text extraction
International Journal on Document Analysis and Recognition
The lifecycle of a digital historical document: structure and content
Proceedings of the 2004 ACM symposium on Document engineering
IBM Journal of Research and Development
Hi-index | 0.01 |
Processing censorship cards of the 20th century in order to support annotation and retrieval processes, leads to a number of challenges for many DIA systems. Problems due to the low layout quality and standard of such a material can be reduced by exploiting information conveyed by color. In this paper, taking into account lessons learned in the context of the IST project Collate, we propose a new method for image segmentation and layout analysis that takes full advantage of color information. The method has been implemented in the DIA system WISDOM++ and tested on a corpus of multiformat documents concerning historic film censorships.