Efficient link-based clustering in a large scaled blog network

  • Authors:
  • Seok-Ho Yoon;Suk-Soon Song;Sang-Wook Kim

  • Affiliations:
  • Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea

  • Venue:
  • Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address efficient processing of link-based clustering in large-scaled data environment. LinkClus is a link-based clustering method that provides good accuracy and reasonable performance. This paper first shows that this method is not sufficiently scalable to be applied to a huge volume of real-world blog data. Then, we observe that the performance bottleneck of LinkClus exists on the initial clustering step. We propose a new method to get over this performance bottleneck. The proposed method first identifies the seed sets for initial clustering efficiently. Here, each seed set consists of a small number (=2~3) of objects that are highly similar to one another. The method then adds every other object into one of seed sets that are the most similar to the object. It also eliminates those objects of very few links that negatively affect the accuracy, thereby enhancing the overall processing performance. Via experiments with real-world blog data, we verify the scalability and accuracy of the proposed method.