An unsupervised feature selection framework based on clustering

Authors:
Sheng-yi Jiang;Lian-xi Wang
Affiliations:
School of Informatics, Guangdong University of Foreign Studies, Guangzhou, China;School of Informatics, Guangdong University of Foreign Studies, Guangzhou, China
Venue:
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Year:
2011

Citing 24
Cited 0

Information-theoretic algorithm for feature selection

Pattern Recognition Letters
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
An Empirical Comparison of Selection Measures for Decision-Tree Induction

Machine Learning
Feature Weighting in k-Means Clustering

Machine Learning
Dimensionality Reduction of Unsupervised Data

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
An introduction to variable and feature selection

The Journal of Machine Learning Research
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
Improving fuzzy c-means clustering based on feature-weight learning

Pattern Recognition Letters
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A clustering-based method for unsupervised intrusion detections

Pattern Recognition Letters
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Constraint Score: A new filter method for feature selection with pairwise constraints

Pattern Recognition
Mixed feature selection based on granulation and approximation

Knowledge-Based Systems
A new feature selection method for Gaussian mixture clustering

Pattern Recognition
A Cluster-Based Feature Selection Approach

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Approximate Equal Frequency Discretization Method

GCIS '09 Proceedings of the 2009 WRI Global Congress on Intelligent Systems - Volume 03
Selecting discrete and continuous features based on neighborhood decision error minimization

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Supervised feature selection by clustering using conditional mutual information-based distances

Pattern Recognition
The feature selection problem: traditional methods and a new algorithm

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection plays an important part in improving the quality of learning algorithms in machine learning and data mining. It has been widely studied in supervised learning, whereas it is still relatively rare researched in unsupervised learning. In this work, a clustering-based framework formed by an unsupervised feature selection algorithm is proposed. The proposed framework is mainly concerned with the problem of determining and choosing important features, which are selected by ranking the features according to the importance measure scores, from the original feature set without class information. Theory analyzed indicates that the time complexity of each algorithm is nearly linear with the size and the number of features of dataset. Experimental results on UCI datasets show that algorithm with different scores in the framework are able to identify the important features with clustering, and the proposed algorithm have obtained competitive results in terms of classification error rate and the degree of dimensionality reduction when compared with the state-of-the-art supervised and unsupervised feature selection approaches.