Dimensionality reduction in data summarization approach to learning relational data

Authors:
Chung Seng Kheau;Rayner Alfred;Lau Hui Keng
Affiliations:
School of Engineering and Information Technology, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia;School of Engineering and Information Technology, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia;School of Engineering and Information Technology, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia
Venue:
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Year:
2013

Citing 12
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Inductive logic programming for knowedge discovery in databases

Relational Data Mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Learning Logical Definitions from Relations

Machine Learning
Relational Distance-Based Clustering

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
The Study of Dynamic Aggregation of Relational Attributes on Relational Data Mining

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Multirelational classification: a multiple view approach

Knowledge and Information Systems
Multi-relational Classification Based on the Contribution of Tables

AICI '09 Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence - Volume 04
Pre-Processing Structured Data for Standard Machine Learning Algorithms by Supervised Graph Propositionalization - A Case Study with Medicinal Chemistry Datasets

ICMLA '10 Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the growing amount of digital data stored in relational databases, more new approaches are required to learn relational data. The DARA algorithm is designed to summarize data and it is one of the approaches introduced in relational data mining in order to handle data with one-to-many relations. The DARA algorithm transforms data stored in relational databases into a vector space representation by applying the information retrieval theory. Based on the experimental results, the DARA algorithm is proven to be very effective in learning relational data. However, DARA suffers a major drawback when the cardinalities of attributes are very high because the size of the vector space representation depends on the number of unique values that exist for all attributes in the dataset. This paper investigates the effects of discretizing the magnitude of terms computed and applying a feature selection process that reduces the cardinalities of attributes of the relational datasets on the predictive accuracy of the overall classification task. This involves the task of finding the best set of relevant features used to summarize the data, in which the feature selection processed is performed based on the magnitude of terms computed earlier. Based on the results obtained, it shows that the predictive accuracy of the classification task can be improved by improving the quality of the summarized data. The quality of the summarized data can be enhanced by appropriately discretizing the magnitude of terms computed earlier and also appropriately selecting only a certain percentage of the attributes.