Tools for monitoring, visualizing, and refining collections of noisy documents

Authors:
Daniel Lopresti;George Nagy
Affiliations:
Lehigh University, Bethlehem, PA;Rensselaer Polytechnic Institute, Troy, NY
Venue:
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Year:
2009

Citing 13
Cited 1

A Prototype Document Image Analysis System for Technical Journals

Computer
Prototype Extraction and Adaptive OCR

IEEE Transactions on Pattern Analysis and Machine Intelligence
Using the Gamera framework for the recognition of cultural heritage materials

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Issues in Ground-Truthing Graphic Documents

GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
UW-ISL Document Image Analysis Toolbox: An Experimental Environment

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Data GroundTruth, Complexity, and Evaluation Measures for Color Document Analysis

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Why Table Ground-Truthing is Hard

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Mixed-initiative development of language processing systems

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Computer assisted visual interactive recognition: caviar

Computer assisted visual interactive recognition: caviar
Performance evaluation for text processing of noisy inputs

Proceedings of the 2005 ACM symposium on Applied computing
Cooperation and quality in wikipedia

Proceedings of the 2007 international symposium on Wikis
Optical character recognition errors and their effects on natural language processing

Proceedings of the second workshop on Analytics for noisy unstructured text data
Notes on contemporary table recognition

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

A platform for storing, visualizing, and interpreting collections of noisy documents

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developing better systems for document image analysis requires understanding errors, their sources, and their effects. The interactions between various processing steps are complex, with details that can be obscured by the statistical methods that are employed in many cases. In this paper, we describe tools we are building to help the user view and understand the results of common document analysis procedures. Unlike existing platforms for ground-truthing page images, our system also allows users to visualize the results of automated error analyses. Recognition errors can be corrected interactively, with the effort to do so recorded as a measure that is useful in performance evaluation. Beyond this functionality for exploring error behavior, we consider how such tools could be designed to improve the quality of collections of badly recognized documents incrementally as users interact with them on a regular basis. We conclude by discussing topics for future research.