Genetic programming: on the programming of computers by means of natural selection
Genetic programming: on the programming of computers by means of natural selection
Evolutionary computation: toward a new philosophy of machine intelligence
Evolutionary computation: toward a new philosophy of machine intelligence
Genetic algorithms + data structures = evolution programs (3rd ed.)
Genetic algorithms + data structures = evolution programs (3rd ed.)
An introduction to genetic algorithms
An introduction to genetic algorithms
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Clustering Algorithms
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
Feature Subset Selection Using a Genetic Algorithm
IEEE Intelligent Systems
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Feature Selection for Clustering
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Summary from the KDD-03 panel: data mining: the next 10 years
ACM SIGKDD Explorations Newsletter
Evolutionary model selection in unsupervised learning
Intelligent Data Analysis
A Branch and Bound Algorithm for Feature Subset Selection
IEEE Transactions on Computers
Hi-index | 0.00 |
We present a method to select a relevant dimension subset (with few or no loss of information) for clustering and outlier detection in high dimensional datasets. We use a heuristic search for relevant dimension subset selection based on genetic algorithm. The genetic algorithm fitness function for clustering uses the validity indexes of classification algorithms. We first use these validity indexes to select a dimension subset and then, to evaluate the clustering quality in this subspace. For outlier detection, the genetic algorithm fitness function is an individual distance-based function. The performances of our new approach of dimension selection are evaluated on simulations with different high dimensional datasets for the two applications (clustering and outlier detection). Furthermore, as the number of dimensions is low, it is possible to display the datasets in order to visually evaluate and interpret the obtained results.