Automatic text categorization based on content analysis with cognitive situation models

  • Authors:
  • Yi Guo;Zhiqing Shao;Nan Hua

  • Affiliations:
  • Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China and Shanghai Key Laboratory of Computer Software Evaluation and Testing, Sha ...;Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;The Telecommunication Engineering Institute, The Air Force Engineering University, Xi'an 710077, China

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 0.07

Visualization

Abstract

Text categorization is an important research area of text mining. The original purpose of text categorization is to recognize, understand and organize different types of texts or documents. The general categorization approaches are treated as supervised learning, which infers similarity among a collection of categorized texts for training purposes. The existing categorization approaches are obviously not content-oriented and constrained at single word level. This paper introduces an innovative content-oriented text categorization approach named as CogCate. Inspired by cognitive situation models, CogCate exploits a human cognitive procedure in categorizing texts. In addition to traditional statistical analysis at word level, CogCate also applies lexical/semantical analysis, which ensures the accuracy of categorization. The evaluation experiments have testified the performance of CogCate. Meanwhile, CogCate remarkably reduces the time and effort spent on software training and maintenance of text collections. Our research work attests that interdisciplinary research efforts benefit text categorization.