An unsupervised feature selection algorithm: laplacian score combined with distance-based entropy measure

  • Authors:
  • Rongye Liu;Ning Yang;Xiangqian Ding;Lintao Ma

  • Affiliations:
  • Department of Information Science and Engineering, Ocean University of China, Qingdao, China;The Center of Information Engineering, Ocean University of China, Qingdao, China;The Center of Information Engineering, Ocean University of China, Qingdao, China;The Center of Information Engineering, Ocean University of China, Qingdao, China

  • Venue:
  • IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In unsupervised learning paradigm, we are not given class labels, which features should we keep? Unsupervised feature selection method well solves this problem and has got a good effect in features selection with unlabeled data. Laplacian Score (LS) is a newly proposed unsupervised feature selection algorithm. However it uses k-means clustering method to select the top k features, therefore, the disadvantages of k-means clustering method greatly affect the result and increases the complexity of LS. In this paper, we introduce a novel algorithm called LSE (Laplacian Score combined with distance-based entropy measure) for automatically selecting subset of features. LSE uses distance-based entropy to replace the k-means clustering method in LS, which intrinsically solves the drawbacks of LS and contribute to the stability and efficiency of LSE. We compare LSE with LS on six UCI data sets. Experimental results demonstrate LSE can outperform LS on stability and efficiency, especially when processing high dimension datasets.