On exploiting content and citations together to compute similarity of scientific papers

Authors:
Masoud Reyhani Hamedani;Sang-Wook Kim;Sang-Chul Lee;Dong-Jin Kim
Affiliations:
Hanyang University, Seoul, South Korea;Hanyang University, Seoul, South Korea;Hanyang University, Seoul, South Korea;NHN Institute of The Next Network, Seoul, South Korea
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 13
Cited 0

An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining link and content analysis to estimate semantic similarity

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Recommending citations for academic papers

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combining Link and Content Information for Scientific Topics Discovery

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
P-Rank: a comprehensive structural similarity measure over information networks

Proceedings of the 18th ACM conference on Information and knowledge management
Using Kullback-Leibler distance for text categorization

ECIR'03 Proceedings of the 25th European conference on IR research
Yet another paper ranking algorithm advocating recent publications

Proceedings of the 19th international conference on World wide web
Scholarly paper recommendation via user's recent research interests

Proceedings of the 10th annual joint conference on Digital libraries
On computing text-based similarity in scientific literature

Proceedings of the 20th international conference companion on World wide web
When documents are very long, BM25 fails!

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In computing the similarity of scientific papers, previous text-based and link-based similarity measures look at only a single side of the content and citations. In this paper, we propose a novel approach called SimCC that effectively combines the content and citation information to accurately compute the similarity of scientific papers. Unlike previous approaches, SimCC effectively represents both authority and context of a scientific paper simultaneously in computing similarities. Also, we propose SimCC+A to consider recently-published papers. The effectiveness of our proposed method is demonstrated via extensive experiments on a real-world dataset of scientific papers, with more than 100% improvement in accuracy compared with previous methods.