An evolutionary approach for high dimensional attribute selection

Authors:
Lydia Boudjeloud-Assala
Affiliations:
Laboratory of Theoretical and Applied of Computer Science, University of Lorraine, LITA EA 3097, Ile du Saulcy, Metz Cedex 01, F-57045, France
Venue:
International Journal of Intelligent Information and Database Systems
Year:
2012

Citing 16
Cited 0

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
Evolutionary computation: toward a new philosophy of machine intelligence

Evolutionary computation: toward a new philosophy of machine intelligence
Genetic algorithms + data structures = evolution programs (3rd ed.)

Genetic algorithms + data structures = evolution programs (3rd ed.)
An introduction to genetic algorithms

An introduction to genetic algorithms
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Clustering Algorithms

Clustering Algorithms
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Feature Subset Selection Using a Genetic Algorithm

IEEE Intelligent Systems
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Feature Selection for Clustering

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Summary from the KDD-03 panel: data mining: the next 10 years

ACM SIGKDD Explorations Newsletter
Evolutionary model selection in unsupervised learning

Intelligent Data Analysis
A Branch and Bound Algorithm for Feature Subset Selection

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method to select a relevant dimension subset (with few or no loss of information) for clustering and outlier detection in high dimensional datasets. We use a heuristic search for relevant dimension subset selection based on genetic algorithm. The genetic algorithm fitness function for clustering uses the validity indexes of classification algorithms. We first use these validity indexes to select a dimension subset and then, to evaluate the clustering quality in this subspace. For outlier detection, the genetic algorithm fitness function is an individual distance-based function. The performances of our new approach of dimension selection are evaluated on simulations with different high dimensional datasets for the two applications (clustering and outlier detection). Furthermore, as the number of dimensions is low, it is possible to display the datasets in order to visually evaluate and interpret the obtained results.