Exploiting WWW Resources in Experimental Document Analysis Research

Authors:
Daniel P. Lopresti
Affiliations:
-
Venue:
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Year:
2002

Citing 5
Cited 2

A vector space model for automatic indexing

Communications of the ACM
Interactive Web Applications with Tcl/Tk

Interactive Web Applications with Tcl/Tk
Locating and Recognizing Text in WWW Images

Information Retrieval
Automatic Table Ground Truth Generation and a Background-Analysis-Based Table Structure Extraction Method

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Constructing Web-Based Legacy Index Card Archives - Architectural Design Issues and Initial Data Acquisition

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition

White-Box Evaluation of Computer Vision Algorithms through Explicit Decision-Making

ICVS '09 Proceedings of the 7th International Conference on Computer Vision Systems: Computer Vision Systems
Leveraging the CAPTCHA problem

HIP'05 Proceedings of the Second international conference on Human Interactive Proofs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many large collections of document images are now becoming available online as part of digital library initiatives, fueled by the explosive growth of the World Wide Web. In this paper, we examine protocols and system-related issues that arise in attempting to make use of these new resources, both as a target application (building better search engines) and as a way of overcoming the problem of acquiring ground-truth to support experimental document analysis research. We also report on our experiences running two simple tests involving data drawn from one such collection. The potential synergies between document analysis and digital libraries could lead to substantial benefits for both communities.