S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently

Authors:
Yuanzhe Cai;Pei Li;Hongyan Liu;Jun He;Xiaoyong Du
Affiliations:
Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China;Department of Management Science and Engineering, Tsinghua University, China;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China
Venue:
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Year:
2008

Citing 16
Cited 3

Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Associative Document Retrieval Techniques Using Bibliographic Information

Journal of the ACM (JACM)
A vector space model for automatic indexing

Communications of the ACM
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Mining the Web's Link Structure

Computer
Exploiting hierarchical domain structure to compute similarity

ACM Transactions on Information Systems (TOIS)
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning block importance models for web pages

Proceedings of the 13th international conference on World Wide Web
Similarity spreading: a unified framework for similarity calculation of interrelated objects

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
TSSP: A Reinforcement Algorithm to Find Related Papers

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Cross-relational clustering with user's guidance

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
LinkClus: efficient clustering via heterogeneous semantic links

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fast Random Walk with Restart and Its Applications

ICDM '06 Proceedings of the Sixth International Conference on Data Mining

Taming computational complexity: efficient and parallel simrank optimizations on undirected graphs

WAIM'10 Proceedings of the 11th international conference on Web-age information management
A space and time efficient algorithm for SimRank computation

World Wide Web
Assessing single-pair similarity over graphs by aggregating first-meeting probabilities

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Both Content analysis and link analysis have its advantages in measuring relationships among documents. In this paper, we propose a new method to combine these two methods to compute the similarity of research papers so that we can do clustering of these papers more accurately. In order to improve the efficiency of similarity calculation, we develop a strategy to deal with the relationship graph separately without affecting the accuracy. We also design an approach to assign different weights to different links to the papers, which can enhance the accuracy of similarity calculation. The experimental results conducted on ACM Data Set show that our new algorithm, S-SimRank,outperforms other algorithms.