Text extraction from graphical document images using sparse representation

Authors:
Thai V. Hoang;Salvatore Tabbone
Affiliations:
Hanoi University of Technology, Hanoi, Vietnam;Hanoi University of Technology, Hanoi, Vietnam
Venue:
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Year:
2010

Citing 19
Cited 1

A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Blind separation of sources, Part 1: an adaptive algorithm based on neuromimetic architecture

Signal Processing
Classification of binary document images into textual or nontextual data blocks using network models

Machine Vision and Applications
Detection of Text Regions From Digital Engineering Drawings

IEEE Transactions on Pattern Analysis and Machine Intelligence
Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
Determining the minimum-area encasing rectangle for an arbitrary closed curve

Communications of the ACM
Digital Image Processing

Digital Image Processing
An Object-Oriented Progressive-Simplification-Based Vectorization System for Engineering Drawings: Model, Algorithm, and Performance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detection of Dimension Sets in Engineering Drawings

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improved Directional Morphological Operations for Separation of Characters from Maps/Graphics

GREC '97 Selected Papers from the Second International Workshop on Graphics Recognition, Algorithms and Systems
Text/Graphics Separation in Maps

GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
Text/Graphics Separation Revisited

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Text Segmentation from Complex Background Using Sparse Representations

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Review:

Neural Computation
Inpainting and Zooming Using Sparse Representations

The Computer Journal
Extraction of Nom Text Regions from Stele Images Using Area Voronoi Diagram

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A new shape descriptor defined on the Radon transform

Computer Vision and Image Understanding
-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

IEEE Transactions on Signal Processing
Image decomposition via the combination of sparse representations and a variational approach

IEEE Transactions on Image Processing

The generalization of the R-transform for invariant pattern representation

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel text extraction method from graphical document images is presented in this paper. Graphical document images containing text and graphics components are considered as two-dimensional signals by which text and graphics have different morphological characteristics. The proposed algorithm relies upon a sparse representation framework with two appropriately chosen discriminative overcomplete dictionaries, each one gives sparse representation over one type of signal and non-sparse representation over the other. Separation of text and graphics components is obtained by promoting sparse representation of input images in these two dictionaries. Some heuristic rules are used for grouping text components into text strings in post-processing steps. The proposed method overcomes the problem of touching between text and graphics. Preliminary experiments show some promising results on different types of document.