FARS: A Multi-relational Feature and Relation Selection Approach for Efficient Classification

Authors:
Bo Hu;Hongyan Liu;Jun He;Xiaoyong Du
Affiliations:
Key Labs of Data Engineering and Knowledge Engineering, MOE, China Information School, Renmin University of China, Beijing, China 100872;School of Economics and Management, Tsinghua University, Beijing, China 100084;Key Labs of Data Engineering and Knowledge Engineering, MOE, China Information School, Renmin University of China, Beijing, China 100872;Key Labs of Data Engineering and Knowledge Engineering, MOE, China Information School, Renmin University of China, Beijing, China 100872
Venue:
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Year:
2008

Citing 8
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature Selection with Selective Sampling

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
CrossMine: Efficient Classification Across Multiple Database Relations

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
An efficient multi-relational Naïve Bayesian classifier based on semantic relationship graph

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining

Short Communication: Ontology extraction from relational database: Concept hierarchy as background knowledge

Knowledge-Based Systems
Informative variables selection for multi-relational supervised learning

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection is an essential data processing step to remove the irrelevant and redundant attributes for shorter learning time, better accuracy and better comprehensibility. A number of algorithms have been proposed in both data mining and machine learning area. These algorithms are usually used in single table environment, where data are stored in one relational table or one flat file. They are not suitable for multi-relational environment, where data are stored in multiple tables joined each other by semantic relationships. To solve this problem, in this paper we propose a novel approach called FARSto do both feature and relation selection for efficient multi-relational classification. By this approach, we not only extend traditional feature selection method to selects relevant features from multi-relations, but also develop a new method to reconstruct the multi-relational database schema and get rid of irrelevant tables to further improve classification performance. Results of experiments conducted on several real databases show that FARScan effectively choose a small set of relevant features, enhancing the classification efficiency significantly and improving prediction accuracy.