Text extraction for spam-mail image filtering using a text color estimation technique

  • Authors:
  • Ji-Soo Kim;S. H. Kim;H. J. Yang;H. J. Son;W. P. Kim

  • Affiliations:
  • Computer Science Dept., Chonnam National University, Korea;Computer Science Dept., Chonnam National University, Korea;Computer Science Dept., Chonnam National University, Korea;Computer Science Dept., Chonnam National University, Korea;Computer Science Dept., Chonnam National University, Korea

  • Venue:
  • IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose an algorithm for extracting text regions from images in spam-mails. The Color Layer-Based Text Extraction(CLTE).It extracts connected components on the eight planes, and then classifies them into either text regions or non-text. We also propose an algorithm to recover damaged text strokes in Korean text images. There are two types of damaged strokes: (1) middle strokes such as '???' or '--' are deleted, and (2) the first and last strokes such as '???' or '???' are filled with black pixels. An experiment with 200 spammail images shows that the proposed approach is more accurate than conventional methods by over 10%.