Efficient link-based clustering in a large scaled blog network

Authors:
Seok-Ho Yoon;Suk-Soon Song;Sang-Wook Kim
Affiliations:
Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea
Venue:
Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Year:
2011

Citing 12
Cited 1

Discovering Frequent Closed Itemsets for Association Rules

ICDT '99 Proceedings of the 7th International Conference on Database Theory
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ReCoM: reinforcement clustering of multi-type interrelated data objects

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information diffusion through blogspace

Proceedings of the 13th international conference on World Wide Web
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A social hypertext model for finding community in blogs

Proceedings of the seventeenth conference on Hypertext and hypermedia
LinkClus: efficient clustering via heterogeneous semantic links

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Diva: a variance-based clustering approach for multi-type relational data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
P-Rank: a comprehensive structural similarity measure over information networks

Proceedings of the 18th ACM conference on Information and knowledge management
Accuracy estimate and optimization techniques for SimRank computation

The VLDB Journal — The International Journal on Very Large Data Bases
Survey of clustering algorithms

IEEE Transactions on Neural Networks

From Frequent Features to Frequent Social Links

International Journal of Information System Modeling and Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address efficient processing of link-based clustering in large-scaled data environment. LinkClus is a link-based clustering method that provides good accuracy and reasonable performance. This paper first shows that this method is not sufficiently scalable to be applied to a huge volume of real-world blog data. Then, we observe that the performance bottleneck of LinkClus exists on the initial clustering step. We propose a new method to get over this performance bottleneck. The proposed method first identifies the seed sets for initial clustering efficiently. Here, each seed set consists of a small number (=2~3) of objects that are highly similar to one another. The method then adds every other object into one of seed sets that are the most similar to the object. It also eliminates those objects of very few links that negatively affect the accuracy, thereby enhancing the overall processing performance. Via experiments with real-world blog data, we verify the scalability and accuracy of the proposed method.