A platform for storing, visualizing, and interpreting collections of noisy documents

Authors:
Bart Lamiroy;Daniel Lopresti
Affiliations:
Nancy-Université - INPL, Nancy, France;Lehigh University, Bethlehem, PA, USA
Venue:
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Year:
2010

Citing 12
Cited 3

Issues in Ground-Truthing Graphic Documents

GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
Scan-to-XML: Using Software Component Algebra for Intelligent Document Generation

GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
UW-ISL Document Image Analysis Toolbox: An Experimental Environment

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
A Social Mechanism of Reputation Management in Electronic Communities

CIA '00 Proceedings of the 4th International Workshop on Cooperative Information Agents IV, The Future of Information Agents in Cyberspace
Why Table Ground-Truthing is Hard

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Review on Computational Trust and Reputation Models

Artificial Intelligence Review
Tools for monitoring, visualizing, and refining collections of noisy documents

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Inductive Logic Programming for Symbol Recognition

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A framework for the assessment of text extraction algorithms on complex colour images

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
An analysis of binarization ground truthing

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Document analysis issues in reading optical scan ballots

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

Document analysis research in the year 2021

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
A real-world noisy unstructured handwritten notebook corpus for document image analysis research

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Final report of GREC'11 arc segmentation contest: performance evaluation on multi-resolution scanned documents

GREC'11 Proceedings of the 9th international conference on Graphics Recognition: new trends and challenges

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of document image analysis is to produce interpretations that match those of a fluent and knowledgeable human when viewing the same input. Because computer vision techniques are not perfect, the text that results when processing scanned pages is frequently noisy. Building on previous work, we propose a new paradigm for handling the inevitable incomplete, partial, erroneous, or slightly orthogonal interpretations that commonly arise in document datasets. Starting from the observation that interpretations are dependent on application context or user viewpoint, we describe a platform now under development that is capable of managing multiple interpretations for a document and offers an unprecedented level of interaction so that users can freely build upon, extend, or correct existing interpretations. In this way, the system supports the creation of a continuously expanding and improving document analysis repository which can be used to support research in the field.