The Retrieval of Document Images: A Brief Survey

Authors:
David S. Doermann
Affiliations:
-
Venue:
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Year:
1997

Citing 0
Cited 7

Information Retrieval from Documents: A Survey

Information Retrieval
Comparison and Classification of Documents Based on Layout Similarity

Information Retrieval
Error correction vs. query garbling for Arabic OCR document retrieval

ACM Transactions on Information Systems (TOIS)
Effect of OCR error correction on Arabic retrieval

Information Retrieval
CMIC at INEX 2007: Book Search Track

Focused Access to XML Documents
Book search: indexing the valuable parts

Proceedings of the 2008 ACM workshop on Research advances in large digital book repositories
Retrieval of document images based on page layout similarity

AMR'06 Proceedings of the 4th international conference on Adaptive multimedia retrieval: user, context, and feedback

Quantified Score

Hi-index	0.00

Visualization

Abstract

The economic feasibility of creating large databases of document images has left a tremendous need for robust ways to access the information these images contain. Printed documents are often scanned for archiving or in an attempt to move toward a paper-less office and stored as images, but without adequate index informationIn order to make full use of the capabilities of traditional database indexing and retrieval techniques, a full conversion of the document may be required. There are many factors, however, which may prohibit complete conversion including its high cost, insufficient document quality, or the fact that parts of the document can simply not be adequately represented in a converted form.In this paper, we provide a survey of methods developed by researchers to access document images without relying on complete and accurate conversion. We briefly discuss traditional text indexing techniques on imperfect data and the retrieval of partially converted documents, followed by a more complete review of techniques for the direct retrieval and characterization of document images including text, drawings and graphics.