A Probabilistic Approach to Feature Selection for Multi-class Text Categorization

  • Authors:
  • Ke Wu;Bao-Liang Lu;Masao Uchiyama;Hitoshi Isahara

  • Affiliations:
  • Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd., Shanghai 200240, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd., Shanghai 200240, China;Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 3-5 Hilaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan;Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 3-5 Hilaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan

  • Venue:
  • ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a probabilistic approach to feature selection for multi-class text categorization. Specifically, we regard document class and occurrence of each feature as events, calculate the probability of occurrence of each feature by the theorem on the total probability and utilize the values as a ranking criterion. Experiments on Reuters-2000 collection show that the proposed method can yield better performance than information gain and ï戮驴-square, which are two well-known feature selection methods.