Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

Authors:
Amit Saxena;John Wang
Affiliations:
G G University, India;Montclair State University, USA
Venue:
International Journal of Data Warehousing and Mining
Year:
2010

Citing 20
Cited 3

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Machine Learning
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Neural vs. statistical classifier in conjunction with genetic algorithm based feature selection

Pattern Recognition Letters
Feature Subset Selection and Ranking for Data Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Incremental wrapper-based gene selection from microarray data for cancer classification

Pattern Recognition
Unsupervised learning with normalised data and non-Euclidean norms

Applied Soft Computing
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Dimensionality reduction using genetic algorithms

IEEE Transactions on Evolutionary Computation
Genetic programming for simultaneous feature selection and classifier design

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Fuzzy logic approaches to structure preserving dimensionality reduction

IEEE Transactions on Fuzzy Systems
Neural-network feature selector

IEEE Transactions on Neural Networks
Unsupervised feature evaluation: a neuro-fuzzy approach

IEEE Transactions on Neural Networks
Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information

IEEE Transactions on Neural Networks

A Clustering Rule Based Approach for Classification Problems

International Journal of Data Warehousing and Mining
Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining
Using multi decision tree technique to improving decision tree classifier

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm GA and testing the classification accuracy CA of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor k-nn algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm L2. Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.