Discovering Frequent Closed Itemsets for Association Rules
ICDT '99 Proceedings of the 7th International Conference on Database Theory
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ReCoM: reinforcement clustering of multi-type interrelated data objects
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
CLOSET+: searching for the best strategies for mining frequent closed itemsets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information diffusion through blogspace
Proceedings of the 13th international conference on World Wide Web
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A social hypertext model for finding community in blogs
Proceedings of the seventeenth conference on Hypertext and hypermedia
LinkClus: efficient clustering via heterogeneous semantic links
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Diva: a variance-based clustering approach for multi-type relational data
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
P-Rank: a comprehensive structural similarity measure over information networks
Proceedings of the 18th ACM conference on Information and knowledge management
Accuracy estimate and optimization techniques for SimRank computation
The VLDB Journal — The International Journal on Very Large Data Bases
Survey of clustering algorithms
IEEE Transactions on Neural Networks
From Frequent Features to Frequent Social Links
International Journal of Information System Modeling and Design
Hi-index | 0.00 |
In this paper, we address efficient processing of link-based clustering in large-scaled data environment. LinkClus is a link-based clustering method that provides good accuracy and reasonable performance. This paper first shows that this method is not sufficiently scalable to be applied to a huge volume of real-world blog data. Then, we observe that the performance bottleneck of LinkClus exists on the initial clustering step. We propose a new method to get over this performance bottleneck. The proposed method first identifies the seed sets for initial clustering efficiently. Here, each seed set consists of a small number (=2~3) of objects that are highly similar to one another. The method then adds every other object into one of seed sets that are the most similar to the object. It also eliminates those objects of very few links that negatively affect the accuracy, thereby enhancing the overall processing performance. Via experiments with real-world blog data, we verify the scalability and accuracy of the proposed method.