Exploiting word cluster information for unsupervised feature selection

  • Authors:
  • Qingyao Wu;Yunming Ye;Michael Ng;Hanjing Su;Joshua Huang

  • Affiliations:
  • Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China;Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China;Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong;Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China;Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China

  • Venue:
  • PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an approach to integrate word clustering information into the process of unsupervised feature selection. In our scheme, the words in the whole feature space are clustered into groups based on the co-occurrence statistics of words. The resulted word clustering information and the bag-of-word information are combined together to measure the goodness of each word, which is our basic metric for selecting discriminative features. By exploiting word cluster information, we extend three well-known unsupervised feature selection methods and propose three new methods. A series of experiments are performed on three benchmark text data sets (the 20 Newsgroups, Reuters-21578 and CLASSIC3). The experimental results have shown that the new unsupervised feature selection methods can select more discriminative features, and in turn improve the clustering performance.