Clustering relational data based on randomized propositionalization

Authors:
Grant Anderson;Bernhard Pfahringer
Affiliations:
Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand
Venue:
ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Year:
2007

Citing 12
Cited 3

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Relational instance-based learning with lists and terms

Machine Learning - Special issue on inducive logic programming
Propositionalization approaches to relational data mining

Relational Data Mining
Learning Logical Definitions from Relations

Machine Learning
Metrics on Terms and Clauses

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Top-Down Induction of Clustering Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Relational Distance-Based Clustering

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Kernels and Distances for Structured Data

Machine Learning
A Novel Kernel Method for Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Propositionalization-based relational subgroup discovery with RSD

Machine Learning
Distances and (Indefinite) Kernels for Sets of Objects

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Kernels over relational algebra structures

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Relational random forests based on random relational rules

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Exploiting propositionalization based on random relational rules for semi-supervised learning

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Conceptual clustering of multi-relational data

ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying standard clustering algorithms like KMeans to multirelational data. We describe how random rules are generated and then turned into boolean-valued features. Clustering generally is not straightforward to evaluate, but preliminary experimental results on a number of standard ILP datasets show promising results. Clusters generated without class information usually agree well with the true class labels of cluster members, i.e. class distributions inside clusters generally differ significantly from the global class distributions. The two-tiered algorithm described shows good scalability due to the randomized nature of the first step and the availability of efficient propositional clustering algorithms for the second step.