Beyond Redundancies: A Metric-Invariant Method for Unsupervised Feature Selection

  • Authors:
  • Yuexian Hou;Peng Zhang;Tingxu Yan;Wenjie Li;Dawei Song

  • Affiliations:
  • Tianjin University, Tianjin and The Hong Kong Polytechnic University, Hong Kong;The Robert Gordon University, Aberdeen;Tianjin University, Tianjin;The Hong Kong Polytechnic University, Hong Kong;The Robert Gordon University, Aberdeen

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A fundamental goal of unsupervised feature selection is denoising, which aims to identify and reduce noisy features that are not discriminative. Due to the lack of information about real classes, denoising is a challenging task. The noisy features can disturb the reasonable distance metric and result in unreasonable feature spaces, i.e., the feature spaces in which common clustering algorithms cannot effectively find real classes. To overcome the problem, we make a primary observation that the relevance of features is intrinsic and independent of any metric scaling on the feature space. This observation implies that feature selection should be invariant, at least to some extent, with respect to metric scaling. In this paper, we clarify the necessity of considering the metric invariance in unsupervised feature selection and propose a novel model incorporating metric invariance. Our proposed method is motivated by the following observations: if the statistic that guides the unsupervised feature selection process is invariant with respect to possible metric scaling, the solution of this model will also be invariant. Hence, if a metric-invariant model can distinguish discriminative features from noisy ones in a reasonable feature space, it will also work on the unreasonable counterpart transformed from the reasonable one by metric scaling. A theoretical justification of the metric invariance of our proposed model is given and the empirical evaluation demonstrates its promising performance.