Evolutionary model selection in unsupervised learning

Authors:
YongSeog Kim;W. Nick Street;Filippo Menczer
Affiliations:
(Correspd. Tel.: +1 435 797 2271/ Fax: +1 435 797 2351/ E-mail: ykim@b202.usu.edu) Department of Business Information Systems, Utah State University, Logan, UT 84322-3515, USA;Department of Business Information Systems, Utah State University, Logan, UT 84322-3515, USA;Department of Business Information Systems, Utah State University, Logan, UT 84322-3515, USA
Venue:
Intelligent Data Analysis
Year:
2002

Citing 29
Cited 21

Occam's razor

Information Processing Letters
Genetic algorithms with sharing for multimodal function optimization

Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application
Applied multivariate statistical analysis

Applied multivariate statistical analysis
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm

Pattern Recognition Letters
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
From complex environments to complex behaviors

Adaptive Behavior - Special issue on environment structure and behavior
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering and learning

The handbook of brain theory and neural networks
Feature selection for ensembles

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Visualization and interactive feature selection for unsupervised data

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection in unsupervised learning via evolutionary search

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Algorithms

Clustering Algorithms
Statistical Themes and Lessons for Data Mining

Data Mining and Knowledge Discovery
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Feature Subset Selection Using a Genetic Algorithm

IEEE Intelligent Systems
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Efficient Feature Selection in Conceptual Clustering

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection as a Preprocessing Step for Hierarchical Clustering

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Model Selection in Unsupervised Learning with Applications To Document Clustering

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature Subset Selection and Order Identification for Unsupervised Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Local Selection

EP '98 Proceedings of the 7th International Conference on Evolutionary Programming VII
Feature Selection Via Mathematical Programming

INFORMS Journal on Computing
Introduction to the Special Issue: Multicriterion Optimization

Evolutionary Computation
Efficient and Scalable Pareto Optimization by Evolutionary Local Selection Algorithms

Evolutionary Computation
Clustering with a genetically optimized approach

IEEE Transactions on Evolutionary Computation
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
A Dual-Objective Evolutionary Algorithm for Rules Extraction in Data Mining

Computational Optimization and Applications
Information preserving multi-objective feature selection for unsupervised learning

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Multiobjective Optimization in Bioinformatics and Computational Biology

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Unsupervised feature weighting with multi niche crowding genetic algorithms

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Mixture-model cluster analysis using information theoretical criteria

Intelligent Data Analysis
Clustering stability-based feature selection for unsupervised texture classification

Machine Graphics & Vision International Journal
Feature selection for genomic data sets through feature clustering

International Journal of Data Mining and Bioinformatics
Using biclustering for automatic attribute selection to enhance global visualization

VIEW'06 Proceedings of the 1st first visual information expert conference on Pixelization paradigm
Combining evolutionary and sequential search strategies for unsupervised feature selection

ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
A unifying criterion for unsupervised clustering and feature selection

Pattern Recognition
Nearest-neighbor guided evaluation of data reliability and its applications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems

International Journal of Bio-Inspired Computation
Exploiting the trade-off — the benefits of multiple objectives in data clustering

EMO'05 Proceedings of the Third international conference on Evolutionary Multi-Criterion Optimization
Visual interactive evolutionary algorithm for high dimensional data clustering and outlier detection

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An evaluation of filter and wrapper methods for feature selection in categorical clustering

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Multiobjective optimization of co-clustering ensembles

Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
An evolutionary approach for high dimensional attribute selection

International Journal of Intelligent Information and Database Systems
Unsupervised fuzzy-rough set-based dimensionality reduction

Information Sciences: an International Journal
Automatic feature selection for named entity recognition using genetic algorithm

Proceedings of the Fourth Symposium on Information and Communication Technology
Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situations, with some estimate of accuracy used to evaluate candidate subsets. However, we often cannot apply supervised learning for lack of a training signal. For these cases, we propose a new feature selection approach based on clustering. A number of heuristic criteria can be used to estimate the quality of clusters built from a given feature subset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi-dimensional objective space. Each evolved solution represents a feature subset and a number of clusters; two representative clustering algorithms, K-means and EM, are applied to form the given number of clusters based on the selected features. Experimental results on both real and synthetic data show that the method can consistently find approximate Pareto-optimal solutions through which we can identify the significant features and an appropriate number of clusters. This results in models with better and clearer semantic relevance.