Validation of Image Defect Models for Optical Character Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Statistical, Nonparametric Methodology for Document Degradation Model Validation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Generation of Synthetic Training Data for an HMM-based Handwriting Recognition System
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A Line Drawings Degradation Model for Performance Characterization
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Low quality document image modeling and enhancement
International Journal on Document Analysis and Recognition
International Journal on Document Analysis and Recognition
Hi-index | 0.00 |
This paper presents an efficient parametrization method for generating synthetic noise on document images. By specifying the desired categories and amount of noise, the method is able to generate synthetic document images with most of degradations observed in real document images (ink splotches, white specks or streaks). Thanks to the ability of simulating different amount and kind of noise, it is possible to evaluate the robustness of many document image analysis methods. It also permits to generate data for algorithms that employ a learning process. The degradation model presented in [7] needs eight parameters for generating randomly noise regions. We propose here an extension of this model which aims to set automatically the eight parameters to generate precisely what a user wants (amount and category of noise). Our proposition consists of three steps. First, Nsp seed-points (i.e. centres of noise regions) are selected by an adaptive procedure. Then, these seed-points are classified into three categories of noise by using a heuristic rule. Finally, each size of noise region is set using a random process in order to generate degradations as realistic as possible.