Discovering Knowledge from Multi-relational Data Based on Information Retrieval Theory

  • Authors:
  • Rayner Alfred

  • Affiliations:
  • Center for Artificial Intelligence, Universiti Malaysia Sabah, Sabah, Malaysia 88999

  • Venue:
  • ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although the TF -IDF weighted frequency matrix (vector space model) has been widely studied and used in document clustering or document categorisation, there has been no attempt to extend this application to relational data that contain one-to-many associations between records. This paper explains the rationale for using TF -IDF (term frequency inverse document frequency), a technique for weighting data attributes, borrowed from Information Retrieval theory, to summarise datasets stored in a multi-relational setting with one-to-many relationships. A novel data summarisation algorithm based on TF -IDF is introduced, which is referred to as Dynamic Aggregation of Relational Attributes (DARA ). The DARA algorithm applies clustering techniques in order to summarise these datasets. The experimental results show that using the DARA algorithm finds solutions with much greater accuracy.