Assessing Student Paraphrases Using Lexical Semantics and Word Weighting

Authors:
Vasile Rus;Mihai Lintean;Art Graesser;Danielle McNamara
Affiliations:
Department of Computer Science, The University of Memphis, USA;Department of Computer Science, The University of Memphis, USA;Department of Psychology, The University of Memphis, USA;Department of Psychology, The University of Memphis, USA
Venue:
Proceedings of the 2009 conference on Artificial Intelligence in Education: Building Learning Systems that Care: From Knowledge Representation to Affective Modelling
Year:
2009

Citing 4
Cited 2

WordNet: a lexical database for English

Communications of the ACM
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Similarity measures based on latent dirichlet allocation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Automatic detection of student mental models based on natural language student input during metacognitive skill training

International Journal of Artificial Intelligence in Education

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present in this paper an approach to assessing student paraphrases in the intelligent tutoring system iSTART. The approach is based on measuring the semantic similarity between a student paraphrase and a reference text, called the textbase. The semantic similarity is estimated using knowledge-based word relatedness measures. The relatedness measures rely on knowledge encoded in Word-Net, a lexical database of English. We also experiment with weighting words based on their importance. The word importance information was derived from an analysis of word distributions in 2,225,726 documents from Wikipedia. Performance is reported for 12 different models which resulted from combining 3 different relatedness measures, 2 word sense disambiguation methods, and 2 word-weighting schemes. Furthermore, comparisons are made to other approaches such as Latent Semantic Analysis and the Entailer.