Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
Relational instance-based learning with lists and terms
Machine Learning - Special issue on inducive logic programming
Propositionalization approaches to relational data mining
Relational Data Mining
Learning Logical Definitions from Relations
Machine Learning
ECML '97 Proceedings of the 9th European Conference on Machine Learning
Top-Down Induction of Clustering Trees
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Relational Distance-Based Clustering
ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Kernels and Distances for Structured Data
Machine Learning
A Novel Kernel Method for Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Propositionalization-based relational subgroup discovery with RSD
Machine Learning
Distances and (Indefinite) Kernels for Sets of Objects
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Kernels over relational algebra structures
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Relational random forests based on random relational rules
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Exploiting propositionalization based on random relational rules for semi-supervised learning
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Conceptual clustering of multi-relational data
ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
Hi-index | 0.00 |
Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying standard clustering algorithms like KMeans to multirelational data. We describe how random rules are generated and then turned into boolean-valued features. Clustering generally is not straightforward to evaluate, but preliminary experimental results on a number of standard ILP datasets show promising results. Clusters generated without class information usually agree well with the true class labels of cluster members, i.e. class distributions inside clusters generally differ significantly from the global class distributions. The two-tiered algorithm described shows good scalability due to the randomized nature of the first step and the availability of efficient propositional clustering algorithms for the second step.