Automatically detecting and classifying noises in document images

Authors:
Rafael Dueire Lins;Serene Banergee;Marcelo Thielo
Affiliations:
Universidade Federal de Pernambuco, Recife - Pernambuco, Brazil;HP Labs., Bangalore, India;HP Brazil R&D, Porto Alegre, Brazil
Venue:
Proceedings of the 2010 ACM Symposium on Applied Computing
Year:
2010

Citing 7
Cited 4

Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of clustering algorithms applied to color image quantization

Pattern Recognition Letters - special issue on pattern recognition in practice V
Random Forests

Machine Learning
A Ranking Algorithm Using Dynamic Clustering for Content-Based Image Retrieval

CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Image Browsing using Hierarchical Clustering

ISCC '99 Proceedings of the The Fourth IEEE Symposium on Computers and Communications
Image Classification to Improve Printing Quality of Mixed-Type Documents

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Clustering by competitive agglomeration

Pattern Recognition

New method for the selection of binarization parameters based on noise features of historical documents

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
HistDoc v. 2.0: enhancing a platform to process historical documents

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
HistDoc - a toolbox for processing images of historical documents

ICIAR'10 Proceedings of the 7th international conference on Image Analysis and Recognition - Volume Part II
De-blurring textual document images

GREC'11 Proceedings of the 9th international conference on Graphics Recognition: new trends and challenges

Quantified Score

Hi-index	0.00

Visualization

Abstract

Image filtering to remove noise in document images follows two different approaches. The first one uses human classification of the noise present in an image for identifying a noise filter to use. The second approach is to blindly apply a batch of filters to an image. The former approach, although widely used, may insert noise in the filtering process due to the incorrect classification of the noise or even unsuitable filtering parameters. This paper presents a new paradigm for document image filtering. It aims at doing a more accurate and computationally efficient document cleanup by pre-characterizing the noise that is present in the document based on a set of human labeled training samples. The current focus of the project is on pre-characterization of the following types of noise: back-to-front interference or bleed through, skew and orientation, blur and framing.