Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet

  • Authors:
  • Cheng Hua Li;Ju Cheng Yang;Soon Cheol Park

  • Affiliations:
  • Department of Mathematics, Statistics and Computer Science, St. Francis Xavier University, Antigonish, Nova Scotia, Canada B2G 2W5 and School of Information Technology, Jiangxi University of Finan ...;Department of Mathematics, Statistics and Computer Science, St. Francis Xavier University, Antigonish, Nova Scotia, Canada B2G 2W5 and School of Information Technology, Jiangxi University of Finan ...;Department of Mathematics, Statistics and Computer Science, St. Francis Xavier University, Antigonish, Nova Scotia, Canada B2G 2W5 and School of Information Technology, Jiangxi University of Finan ...

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

In this paper, a corpus-based thesaurus and WordNet were used to improve text categorization performance. We employed the k-NN algorithm and the back propagation neural network (BPNN) algorithms as the classifiers. The k-NN is a simple and famous approach for categorization, and the BPNNs has been widely used in the categorization and pattern recognition fields. However the standard BPNN has some generally acknowledged limitations, such as a slow training speed and can be easily trapped into a local minimum. To alleviate the problems of the standard BPNN, two modified versions, Morbidity neurons Rectified BPNN (MRBP) and Learning Phase Evaluation BPNN (LPEBP), were considered and applied to the text categorization. We conducted the experiments on both the standard reuter-21578 data set and the 20 Newsgroups data set. Experimental results showed that our proposed methods achieved high categorization effectiveness as measured by the precision, recall and F-measure protocols.