Evolutionary model selection in unsupervised learning

  • Authors:
  • YongSeog Kim;W. Nick Street;Filippo Menczer

  • Affiliations:
  • (Correspd. Tel.: +1 435 797 2271/ Fax: +1 435 797 2351/ E-mail: ykim@b202.usu.edu) Department of Business Information Systems, Utah State University, Logan, UT 84322-3515, USA;Department of Business Information Systems, Utah State University, Logan, UT 84322-3515, USA;Department of Business Information Systems, Utah State University, Logan, UT 84322-3515, USA

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situations, with some estimate of accuracy used to evaluate candidate subsets. However, we often cannot apply supervised learning for lack of a training signal. For these cases, we propose a new feature selection approach based on clustering. A number of heuristic criteria can be used to estimate the quality of clusters built from a given feature subset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi-dimensional objective space. Each evolved solution represents a feature subset and a number of clusters; two representative clustering algorithms, K-means and EM, are applied to form the given number of clusters based on the selected features. Experimental results on both real and synthetic data show that the method can consistently find approximate Pareto-optimal solutions through which we can identify the significant features and an appropriate number of clusters. This results in models with better and clearer semantic relevance.