Text Extraction from Gray Scale Document Images Using Edge Information

Authors:
Affiliations:
Venue:
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Year:
2001

Citing 0
Cited 5

Text - Image Separation in Devanagari Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
A multi-plane approach for text segmentation of complex document images

Pattern Recognition
Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
A knowledge-based system for extracting text-lines from mixed and overlapping text/graphics compound document images

Expert Systems with Applications: An International Journal
Text localization and extraction from complex gray images

ICVGIP'06 Proceedings of the 5th Indian conference on Computer Vision, Graphics and Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: In this paper we present a well designed method that makes use of edge information to extract textual blocks from gray scale document images. It aims at detecting textual regions on heavy noise infected newspaper images and separate them from graphical regions. The algorithm traces the feature points in different entities and then groups those edge points of textual regions. From using the technology of line approximation and layout categorization, it can successfully retrieve directional placed text blocks. Finally feature based connected component merging was introduced to gather homogeneous textual regions together within the scope of its bounding rectangles. We can obtain correct page decomposition with efficient computation and reduced memory size by handling line segments instead of small pixels. The proposed method has been tested on a large group of newspaper images with multiple page layouts, promising results approved the effectiveness of our method.