Discovering Knowledge from Multi-relational Data Based on Information Retrieval Theory

Authors:
Rayner Alfred
Affiliations:
Center for Artificial Intelligence, Universiti Malaysia Sabah, Sabah, Malaysia 88999
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 16
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
A vector space model for automatic indexing

Communications of the ACM
Information Retrieval

Information Retrieval
Propositionalization approaches to relational data mining

Relational Data Mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Relational Distance-Based Clustering

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Transformation-Based Learning Using Multirelational Aggregation

ILP '01 Proceedings of the 11th International Conference on Inductive Logic Programming
An introduction to variable and feature selection

The Journal of Machine Learning Research
Scalability and efficiency in multi-relational data mining

ACM SIGKDD Explorations Newsletter
A Genetic-Based Feature Construction Method for Data Summarisation

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Top-down induction of first-order logical decision trees

Artificial Intelligence
Discretization numbers for multiple-instances problem in relational database

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although the TF -IDF weighted frequency matrix (vector space model) has been widely studied and used in document clustering or document categorisation, there has been no attempt to extend this application to relational data that contain one-to-many associations between records. This paper explains the rationale for using TF -IDF (term frequency inverse document frequency), a technique for weighting data attributes, borrowed from Information Retrieval theory, to summarise datasets stored in a multi-relational setting with one-to-many relationships. A novel data summarisation algorithm based on TF -IDF is introduced, which is referred to as Dynamic Aggregation of Relational Attributes (DARA ). The DARA algorithm applies clustering techniques in order to summarise these datasets. The experimental results show that using the DARA algorithm finds solutions with much greater accuracy.