Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the recommending of citations for research papers
CSCW '02 Proceedings of the 2002 ACM conference on Computer supported cooperative work
The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Journal of the American Society for Information Science and Technology
Hierarchical Language Models for Expert Finding in Enterprise Corpora
ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
Expertise modeling for matching papers with reviewers
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 17th international conference on World Wide Web
Exploring social annotations for information retrieval
Proceedings of the 17th international conference on World Wide Web
Novelty and diversity in information retrieval evaluation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Adapting LDA Model to Discover Author-Topic Relations for Email Analysis
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Latent dirichlet allocation in web spam filtering
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Recommending scientific articles using citeulike
Proceedings of the 2008 ACM conference on Recommender systems
Enhancing Expert Finding Using Organizational Hierarchies
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Finding topic trends in digital libraries
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Latent dirichlet allocation for tag recommendation
Proceedings of the third ACM conference on Recommender systems
Enhancing expertise retrieval using community-aware strategies
Proceedings of the 18th ACM conference on Information and knowledge management
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
TwitterRank: finding topic-sensitive influential twitterers
Proceedings of the third ACM international conference on Web search and data mining
Integrating multiple document features in language models for expert finding
Knowledge and Information Systems
Metadata impact on research paper similarity
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Short text similarity based on probabilistic topics
Knowledge and Information Systems
Empirical study of topic modeling in Twitter
Proceedings of the First Workshop on Social Media Analytics
Entity disambiguation with hierarchical topic models
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Information Sciences: an International Journal
Least squares quantization in PCM
IEEE Transactions on Information Theory
Hi-index | 0.07 |
The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers' abstract, and some additional features such as their authors, keywords, and the journals in which they were published. Our work explores several methods to exploit this information, first by using methods based on the vector space model and then by adapting language modeling techniques to this end. In the first case, in addition to a number of standard approaches we experiment with the use of a form of explicit semantic analysis. In the second case, the basic strategy we pursue is to augment the information contained in the abstract by interpolating the corresponding language model with language models for the authors, keywords and journal of the paper. This strategy is then extended by revealing the latent topic structure of the collection using an adaptation of Latent Dirichlet Allocation, in which the keywords that were provided by the authors are used to guide the process. Experimental analysis shows that a well-considered use of these techniques significantly improves the results of the standard vector space model approach.