Experiments with semantic similarity measures based on LDA and LSA

Authors:
Nobal Niraula;Rajendra Banjade;Dan Ştefănescu;Vasile Rus
Affiliations:
Department of Computer Science, The University of Memphis;Department of Computer Science, The University of Memphis;Department of Computer Science, The University of Memphis;Department of Computer Science, The University of Memphis
Venue:
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Year:
2013

Citing 12
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Similarity-based methods for word sense disambiguation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Deeper natural language processing for evaluating student answers in intelligent tutoring systems

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Automatic evaluation of topic coherence

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
LDA based similarity modeling for question answering

SS '10 Proceedings of the NAACL HLT 2010 Workshop on Semantic Search
Paraphrase identification on the basis of supervised machine learning techniques

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A comparison of greedy and optimal assessment of natural language student input using word-to-word similarity metrics

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Recognizing Textual Entailment

Recognizing Textual Entailment
Similarity measures based on latent dirichlet allocation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present in this paper experiments with several semantic similarity measures based on the unsupervised method Latent Dirichlet Allocation. For comparison purposes, we also report experimental results using an algebraic method, Latent Semantic Analysis. The proposed semantic similarity methods were evaluated using one dataset that includes student answers from conversational intelligent tutoring systems and a standard paraphrase dataset, the Microsoft Research Paraphrase corpus. Results indicate that the method based on word representations as topic vectors outperforms methods based on distributions over topics and words. The proposed evaluation methods can also be regarded as an extrinsic method for evaluating topic coherence or selecting the number of topics in LDA models, i.e. a task-based evaluation of topic coherence and selection of number of topics in LDA.