A three-step preprocessing algorithm for minimizing e-mail document's atypical characteristics

  • Authors:
  • Ok-Ran Jeong;Dong-Sub Cho

  • Affiliations:
  • Department of Computer Science and Engineering, Ewha Womans University, Seoul, Korea;Department of Computer Science and Engineering, Ewha Womans University, Seoul, Korea

  • Venue:
  • FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Documents that are widely in use today included many atypical characteristics. In particular, non-standardization appears more frequently in e-mail documents than other documents due to the extensive use of informal expressions such as slang and abbreviation. Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier's performance. We suggest a three-step preprocessing algorithm by stages for accurate automatic classification for each e-mail category. This research identifies e-mail document's characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document's atypical characteristics.