Document Understanding System Using Stochastic Context-Free Grammars
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
Business cards include many kinds of information, such as names, addresses and telephone numbers. In order to use the information effectively, it is necessary to extract the information from the cards automatically in order to build a database. The goal of this paper is to extract and recognize characters from color business cards. To separate the foreground from the background in a card, we assign all pixels to eight color types. Then we calculate a dynamic threshold using the color information to extract the foreground. Next, we extract the characters by four steps: (i) connected component extraction, (ii) local thresholding, (iii) mark, line and noise deletion, and (iv) character grouping. Finally, we recognize the characters by a statistical Chinese and English character recognition system. We test 30 business cards which have Chinese characters, English characters, numerals and punctuation marks. The extraction rate and accuracy for our system are 96.97% and 95.43% respectively. The recognition rate is 88.78% for Chinese characters and 97.58% for English characters, numerals and punctuation marks.