An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining link and content analysis to estimate semantic similarity
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Recommending citations for academic papers
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combining Link and Content Information for Scientific Topics Discovery
ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
P-Rank: a comprehensive structural similarity measure over information networks
Proceedings of the 18th ACM conference on Information and knowledge management
Using Kullback-Leibler distance for text categorization
ECIR'03 Proceedings of the 25th European conference on IR research
Yet another paper ranking algorithm advocating recent publications
Proceedings of the 19th international conference on World wide web
Scholarly paper recommendation via user's recent research interests
Proceedings of the 10th annual joint conference on Digital libraries
On computing text-based similarity in scientific literature
Proceedings of the 20th international conference companion on World wide web
When documents are very long, BM25 fails!
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hi-index | 0.00 |
In computing the similarity of scientific papers, previous text-based and link-based similarity measures look at only a single side of the content and citations. In this paper, we propose a novel approach called SimCC that effectively combines the content and citation information to accurately compute the similarity of scientific papers. Unlike previous approaches, SimCC effectively represents both authority and context of a scientific paper simultaneously in computing similarities. Also, we propose SimCC+A to consider recently-published papers. The effectiveness of our proposed method is demonstrated via extensive experiments on a real-world dataset of scientific papers, with more than 100% improvement in accuracy compared with previous methods.