On combining text-based and link-based similarity measures for scientific papers

Authors:
Masoud Reyhani Hamedani;Sang-Chul Lee;Sang-Wook Kim
Affiliations:
Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea
Venue:
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Year:
2013

Citing 16
Cited 0

Probabilistic models in information retrieval

The Computer Journal - Special issue on information retrieval
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
Modern Information Retrieval

Modern Information Retrieval
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Mining long-term search history to improve search accuracy

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A novel document similarity measure based on earth mover's distance

Information Sciences: an International Journal
Recommending citations for academic papers

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
P-Rank: a comprehensive structural similarity measure over information networks

Proceedings of the 18th ACM conference on Information and knowledge management
Using Kullback-Leibler distance for text categorization

ECIR'03 Proceedings of the 25th European conference on IR research
On computing text-based similarity in scientific literature

Proceedings of the 20th international conference companion on World wide web
Detecting outlier sections in us congressional legislation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Associative tag recommendation exploiting multiple textual features

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
When documents are very long, BM25 fails!

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In computing the similarity of scientific papers, text-based and link-based similarity measures look at only a single side of the content or citations. In this paper, we propose a new approach to compute the similarity of scientific papers accurately by combining the text-based and link-based similarity measures. Our proposed method considers the content and citations of the scientific papers simultaneously and combines the similarity scores based on the content and citations by using SVMrank. The effectiveness of our proposed method is demonstrated via extensive experiments on a real-world dataset of scientific papers. The results show that more than 20% improvement in accuracy is obtained with our approach compared with previous methods.