Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of clustering algorithms applied to color image quantization
Pattern Recognition Letters - special issue on pattern recognition in practice V
Machine Learning
A Ranking Algorithm Using Dynamic Clustering for Content-Based Image Retrieval
CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Image Browsing using Hierarchical Clustering
ISCC '99 Proceedings of the The Fourth IEEE Symposium on Computers and Communications
Image Classification to Improve Printing Quality of Mixed-Type Documents
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Clustering by competitive agglomeration
Pattern Recognition
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
HistDoc v. 2.0: enhancing a platform to process historical documents
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
HistDoc - a toolbox for processing images of historical documents
ICIAR'10 Proceedings of the 7th international conference on Image Analysis and Recognition - Volume Part II
De-blurring textual document images
GREC'11 Proceedings of the 9th international conference on Graphics Recognition: new trends and challenges
Hi-index | 0.00 |
Image filtering to remove noise in document images follows two different approaches. The first one uses human classification of the noise present in an image for identifying a noise filter to use. The second approach is to blindly apply a batch of filters to an image. The former approach, although widely used, may insert noise in the filtering process due to the incorrect classification of the noise or even unsuitable filtering parameters. This paper presents a new paradigm for document image filtering. It aims at doing a more accurate and computationally efficient document cleanup by pre-characterizing the noise that is present in the document based on a set of human labeled training samples. The current focus of the project is on pre-characterization of the following types of noise: back-to-front interference or bleed through, skew and orientation, blur and framing.