A Statistical, Nonparametric Methodology for Document Degradation Model Validation

Authors:
Tapas Kanungo;Robert M. Haralick;Werner Stuezle;Henry S. Baird;David Madigan
Affiliations:
Univ. of Maryland, College Park;Univ. of Washington, Seattle;Univ. of Washington, Seattle;Xerox Palo Alto Research Center, Palo Alto, CA;Soliloquy Inc., New York, NY
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2000

Citing 6
Cited 25

Distance transformations in digital images

Computer Vision, Graphics, and Image Processing
Latex: a document preparation system

Latex: a document preparation system
Validation of Image Defect Models for Optical Character Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Document degradation models and a methodology for degradation model validation

Document degradation models and a methodology for degradation model validation
TEX: The Program

TEX: The Program
Structured Document Image Analysis

Structured Document Image Analysis

Relating Statistical Image Differences and Degradation Features

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A Bilingual OCR for Hindi-Telugu Documents and its Applications

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Correcting Document Image Warping Based on Regression of Curved Text Lines

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Estimating Degradation Model Parameters Using Neighborhood Pattern Distributions: An Optimization Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Bible and multilingual optical character recognition

Communications of the ACM - 3d hard copy
Robust and Accurate Vectorization of Line Drawings

IEEE Transactions on Pattern Analysis and Machine Intelligence
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
A progressive learning method for symbols recognition

Proceedings of the 2007 ACM symposium on Applied computing
Effect of OCR error correction on Arabic retrieval

Information Retrieval
DIAR: Advances in Degradation Modeling and Processing

ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Measure of circularity for parts of digital boundaries and its fast computation

Pattern Recognition
Gabor filters-based feature extraction for character recognition

Pattern Recognition
Multi-resolution character recognition by adaptive classification

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Sparse representation based on K-nearest neighbor classifier for degraded Chinese character recognition

PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II
Reconstruction of shredded document based on image feature matching

Expert Systems with Applications: An International Journal
Display text segmentation after learning best-fitted OCR binarization parameters

Expert Systems with Applications: An International Journal
Musings on symbol recognition

GREC'05 Proceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives
RANVEC and the arc segmentation contest: second evaluation

GREC'05 Proceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives
A semi-automatic adaptive OCR for digital libraries

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Document authentication using 2d codes: maximizing the decoding performance using statistical inference

CMS'12 Proceedings of the 13th IFIP TC 6/TC 11 international conference on Communications and Multimedia Security
Morphological filtering on graphs

Computer Vision and Image Understanding
Report on the symbol recognition and spotting contest

GREC'11 Proceedings of the 9th international conference on Graphics Recognition: new trends and challenges
Document noise removal using sparse representations over learned dictionary

Proceedings of the 2013 ACM symposium on Document engineering
Generation of learning samples for historical handwriting recognition using image degradation

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
An efficient parametrization of character degradation model for semi-synthetic image generation

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing

Quantified Score

Hi-index	0.15

Visualization

Abstract

Printing, photocopying, and scanning processes degrade the image quality of a document. Statistical models of these degradation processes are crucial for document image understanding research. Models allow us to predict system performance, conduct controlled experiments to study the breakdown points of the systems, create large multilingual data sets with groundtruth for training classifiers, design optimal noise removal algorithms, choose values for the free parameters of the algorithms, and so on. Although research in document understanding started many decades ago, only two document degradation models have been proposed thus far. Furthermore, no attempts have been made to statistically validate these models. In this paper, we present a statistical methodology that can be used to validate local degradation models. This method is based on a nonparametric, two-sample permutation test. Another standard statistical device驴the power function驴is then used to choose between algorithm variables such as distance functions. Since the validation and the power function procedures are independent of the model, they can be used to validate any other degradation model. A method for comparing any two models is also described. It uses p-values associated with the estimated models to select the model that is closer to the real world.