C4.5: programs for machine learning
C4.5: programs for machine learning
Theories for mutagenicity: a study in first-order and feature-based induction
Artificial Intelligence - Special volume on empirical methods
Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
A vector space model for automatic indexing
Communications of the ACM
Information Retrieval
Propositionalization approaches to relational data mining
Relational Data Mining
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Relational Distance-Based Clustering
ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Transformation-Based Learning Using Multirelational Aggregation
ILP '01 Proceedings of the 11th International Conference on Inductive Logic Programming
An introduction to variable and feature selection
The Journal of Machine Learning Research
Scalability and efficiency in multi-relational data mining
ACM SIGKDD Explorations Newsletter
A Genetic-Based Feature Construction Method for Data Summarisation
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Top-down induction of first-order logical decision trees
Artificial Intelligence
Discretization numbers for multiple-instances problem in relational database
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Hi-index | 0.00 |
Although the TF -IDF weighted frequency matrix (vector space model) has been widely studied and used in document clustering or document categorisation, there has been no attempt to extend this application to relational data that contain one-to-many associations between records. This paper explains the rationale for using TF -IDF (term frequency inverse document frequency), a technique for weighting data attributes, borrowed from Information Retrieval theory, to summarise datasets stored in a multi-relational setting with one-to-many relationships. A novel data summarisation algorithm based on TF -IDF is introduced, which is referred to as Dynamic Aggregation of Relational Attributes (DARA ). The DARA algorithm applies clustering techniques in order to summarise these datasets. The experimental results show that using the DARA algorithm finds solutions with much greater accuracy.