Distance-based feature selection from probabilistic data

Authors:
Tingting Zhao;Bin Pei;Suyun Zhao;Hong Chen;Cuiping Li
Affiliations:
Key Lab of Data Engineering and Knowledge Engineering, Ministry of Education, China,Department of Computer Science, Renmin University of China, China;Key Lab of Data Engineering and Knowledge Engineering, Ministry of Education, China,Department of Computer Science, Renmin University of China, China;Key Lab of Data Engineering and Knowledge Engineering, Ministry of Education, China;Key Lab of Data Engineering and Knowledge Engineering, Ministry of Education, China,Department of Computer Science, Renmin University of China, China;Key Lab of Data Engineering and Knowledge Engineering, Ministry of Education, China,Department of Computer Science, Renmin University of China, China
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 7
Cited 0

Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Consistency-based search in feature selection

Artificial Intelligence
Efficient Clustering of Uncertain Data

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A Survey of Uncertain Data Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Naive Bayes Classification of Uncertain Data

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Decision Trees for Uncertain Data

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection is a powerful tool of dimension reduction from datasets. In the last decade, more and more researchers have paid attentions on feature selection. Further, some researchers begin to focus on feature selection from probabilistic datasets. However, in the existing method of feature selection from probabilistic data, the distance hidden in probabilistic data is neglected. In this paper, we design a new distance measure to select informative feature from probabilistic databases, in which both the distance and randomness in the data are considered. And then, we propose a feature selection algorithm based on the new distance and develop two accelerative algorithms to boost the computation. Furthermore, we introduce a parameter into the distance to reduce the sensitivity to noise. Finally, the experimental results verify the effectiveness of our algorithms.