Diva: a variance-based clustering approach for multi-type relational data

Authors:
Tao Li;Sarabjot Singh Anand
Affiliations:
University of Warwick, Coventry, United Kingdom;University of Warwick, Coventry, United Kingdom
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 12
Cited 3

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
GroupLens: applying collaborative filtering to Usenet news

Communications of the ACM
Data clustering: a review

ACM Computing Surveys (CSUR)
Relational Data Mining

Relational Data Mining
Exploiting hierarchical domain structure to compute similarity

ACM Transactions on Information Systems (TOIS)
ReCoM: reinforcement clustering of multi-type interrelated data objects

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Spectral clustering for multi-type relational data

ICML '06 Proceedings of the 23rd international conference on Machine learning
LinkClus: efficient clustering via heterogeneous semantic links

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Generating semantically enriched user profiles for Web personalization

ACM Transactions on Internet Technology (TOIT)
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Efficient link-based clustering in a large scaled blog network

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Discovering multirelational structure in social media streams

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
A data partitioning approach for hierarchical clustering

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a common technique used to extract knowledge from a dataset in unsupervised learning. In contrast to classical propositional approaches that only focus on simple and flat datasets, relational clustering can handle multi-type interrelated data objects directly and adopt semantic information hidden in the linkage structure to improve the clustering result. However, exploring linkage information will greatly reduce the scalability of relational clustering. Moreover, some characteristics of vector data space utilized to accelerate the propositional clustering procedure are no longer valid in relational data space. These two disadvantages restrain the relational clustering techniques from being applied to very large datasets or in time-critical tasks, such as online recommender systems. In this paper we propose a new variance-based clustering algorithm to address the above difficulties. Our algorithm combines the advantages of divisive and agglomerative clustering paradigms to improve the quality of cluster results. By adopting the idea of Representative Object, it can be executed with linear time complexity. Experimental results show our algorithm achieves high accuracy, efficiency and robustness in comparison with some well-known relational clustering approaches.