Diva: a variance-based clustering approach for multi-type relational data

  • Authors:
  • Tao Li;Sarabjot Singh Anand

  • Affiliations:
  • University of Warwick, Coventry, United Kingdom;University of Warwick, Coventry, United Kingdom

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is a common technique used to extract knowledge from a dataset in unsupervised learning. In contrast to classical propositional approaches that only focus on simple and flat datasets, relational clustering can handle multi-type interrelated data objects directly and adopt semantic information hidden in the linkage structure to improve the clustering result. However, exploring linkage information will greatly reduce the scalability of relational clustering. Moreover, some characteristics of vector data space utilized to accelerate the propositional clustering procedure are no longer valid in relational data space. These two disadvantages restrain the relational clustering techniques from being applied to very large datasets or in time-critical tasks, such as online recommender systems. In this paper we propose a new variance-based clustering algorithm to address the above difficulties. Our algorithm combines the advantages of divisive and agglomerative clustering paradigms to improve the quality of cluster results. By adopting the idea of Representative Object, it can be executed with linear time complexity. Experimental results show our algorithm achieves high accuracy, efficiency and robustness in comparison with some well-known relational clustering approaches.