S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently

  • Authors:
  • Yuanzhe Cai;Pei Li;Hongyan Liu;Jun He;Xiaoyong Du

  • Affiliations:
  • Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China;Department of Management Science and Engineering, Tsinghua University, China;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China

  • Venue:
  • ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Both Content analysis and link analysis have its advantages in measuring relationships among documents. In this paper, we propose a new method to combine these two methods to compute the similarity of research papers so that we can do clustering of these papers more accurately. In order to improve the efficiency of similarity calculation, we develop a strategy to deal with the relationship graph separately without affecting the accuracy. We also design an approach to assign different weights to different links to the papers, which can enhance the accuracy of similarity calculation. The experimental results conducted on ACM Data Set show that our new algorithm, S-SimRank,outperforms other algorithms.