Efficiently handling feature redundancy in high-dimensional data

Authors:
Lei Yu;Huan Liu
Affiliations:
Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 14
Cited 11

C4.5: programs for machine learning

C4.5: programs for machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Customer Retention via Data Mining

Artificial Intelligence Review - Issues on the application of data mining
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Selection with Selective Sampling

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Active feature selection using classes

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

IMMC: incremental maximum margin criterion

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A critical review of multi-objective optimization in data mining: a position paper

ACM SIGKDD Explorations Newsletter
Evolving Feature Selection

IEEE Intelligent Systems
Information-preserving hybrid data reduction based on fuzzy-rough techniques

Pattern Recognition Letters
Neighborhood rough set based heterogeneous feature subset selection

Information Sciences: an International Journal
A new feature selection method for Gaussian mixture clustering

Pattern Recognition
Extended fast feature selection for classification modeling

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
A comparative study of two novel predictor set scoring methods

IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Hybrid feature ranking for proteins classification

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Relevance, redundancy and differential prioritization in feature selection for multiclass gene expression data

ISBMDA'05 Proceedings of the 6th International conference on Biological and Medical Data Analysis
A novel variable precision (θ,σ)-fuzzy rough set model based on fuzzy granules

Fuzzy Sets and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.