A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Classification of newspaper image blocks using texture analysis
Computer Vision, Graphics, and Image Processing
Part-of-Speech Tagging for Table of Contents Recognition
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Automated Detection and Segmentation of Table of Contents Page from Document Images
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Hi-index | 0.00 |
In this paper, a method for automatically indexing the contents to reduce the effort that used to be required for input paper information and constructing index is sought. Various contents formats for journals, which have different features from those for general documents, are described. The principal elements that we want to represent are titles, authors, and pages for each paper. Thus, the three principal elements are modeled according to the order of their arrangement, and then their features are generalized. The content analysis system is then implemented based on the suggested modeling method. The content analysis system, implemented for verifying the suggested method, gets its input in the form containing more than 300 dpi gray scale image and analyze structural features of the contents. It classifies titles, authors and pages using efficient projection method. The definition of each item is classified according to regions, and then is extracted automatically as index information. It also helps to recognize characters region by region. The experimental result is obtained by applying to some of the suggested 6 models, and the system shows 97.3% success rate for various journals.