An intelligent method to extract characters in color document with highlight regions
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part II
Hi-index | 0.01 |
This paper presents a novel approach to binarizing document images. All blocks with individual background intensity values in a document image are first extracted using a two-stage extraction procedure. Then, the intensity distribution of each block is calculated to determine the variation ranges of background intensity. For each extracted block, interior pixels whose intensity values fall within these ranges are regarded as background pixels. For those pixels outside all extracted blocks, Otsu’s global threshold method is applied to binarize them. To evaluate the developed system, 275 representative document images are collected to evaluate the binarization results by recognizing characters extracted from those binarized images. These binarized images are generated using the proposed and other existent approaches and fed into the same optical character recognition system to evaluate the practicability of each method. The proposed document binarization method obtains the highest recognition accuracy.