A New Dependency and Correlation Analysis for Features

Authors:
Guangzhi Qu;Salim Hariri;Mazin Yousif
Affiliations:
IEEE;IEEE;IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 11
Cited 13

C4.5: programs for machine learning

C4.5: programs for machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
A Practical Approach to Feature Selection

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Selection sing a Mutual Information Based Measure

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
An analysis of the behavior of a class of genetic adaptive systems.

An analysis of the behavior of a class of genetic adaptive systems.
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research

K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods

IEEE Transactions on Knowledge and Data Engineering
Application of autonomic agents for global information grid management and security

Proceedings of the 2007 Summer Computer Simulation Conference
Feature Selection Using Mutual Information: An Experimental Study

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Feature selection with dynamic mutual information

Pattern Recognition
A dependency-based search strategy for feature selection

Expert Systems with Applications: An International Journal
A novel unsupervised classification approach for network anomaly detection by k-Means clustering and ID3 decision tree learning methods

The Journal of Supercomputing
A novel information theoretic-interact algorithm (IT-IN) for feature selection using three machine learning algorithms

Expert Systems with Applications: An International Journal
Correntropy based feature selection using binary projection

Pattern Recognition
Feature subset selection wrapper based on mutual information and rough sets

Expert Systems with Applications: An International Journal
Feature subset selection with cumulate conditional mutual information minimization

Expert Systems with Applications: An International Journal
Divergence-based feature selection for separate classes

Neurocomputing
Mutual information-based method for selecting informative feature sets

Pattern Recognition
A new matching strategy for content based image retrieval system

Applied Soft Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The quality of the data being analyzed is a critical factor that affects the accuracy of data mining algorithms. There are two important aspects of the data quality, one is relevance and the other is data redundancy. The inclusion of irrelevant and redundant features in the data mining model results in poor predictions and high computational overhead. This paper presents an efficient method concerning both the relevance of the features and the pairwise features correlation in order to improve the prediction and accuracy of our data mining algorithm. We introduce a new feature correlation metric Q_Y (X_i ,X_j ) and feature subset merit measure e(S) to quantify the relevance and the correlation among features with respect to a desired data mining task (e.g., detection of an abnormal behavior in a network service due to network attacks). Our approach takes into consideration not only the dependency among the features, but also their dependency with respect to a given data mining task. Our analysis shows that the correlation relationship among features depends on the decision task and, thus, they display different behaviors as we change the decision task. We applied our data mining approach to network security and validated it using the DARPA KDD99 benchmark data set. Our results show that, using the new decision dependent correlation metric, we can efficiently detect rare network attacks such as User to Root (U2R) and Remote to Local (R2L) attacks. The best reported detection rates for U2R and R2L on the KDD99 data sets were 13.2 percent and 8.4 percent with 0.5 percent false alarm, respectively. For U2R attacks, our approach can achieve a 92.5 percent detection rate with a false alarm of 0.7587 percent. For R2L attacks, our approach can achieve a 92.47 percent detection rate with a false alarm of 8.35 percent.