Observing lemmatization effect in LSA coherence and comprehension grading of learner summaries

Authors:
Iraide Zipitria;Ana Arruarte;Jon Ander Elorriaga
Affiliations:
Language and Information Systems Department, Computer Science Faculty, University of the Basque Country, Donostia, Basque Country, Spain;Language and Information Systems Department, Computer Science Faculty, University of the Basque Country, Donostia, Basque Country, Spain;Language and Information Systems Department, Computer Science Faculty, University of the Basque Country, Donostia, Basque Country, Spain
Venue:
ITS'06 Proceedings of the 8th international conference on Intelligent Tutoring Systems
Year:
2006

Citing 3
Cited 0

Automatic evaluation of students' answers using syntactically enhanced LSA

HLT-NAACL-EDUC '03 Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing - Volume 2
Interplay between Syntax and Semantics during Sentence Comprehension: ERP Effects of Combining Syntactic and Semantic Violations

Journal of Cognitive Neuroscience
FLSA: extending latent semantic analysis with features for dialogue act classification

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current work in learner evaluation of Intelligent Tutoring Systems (ITSs), is moving towards open-ended educational content diagnosis. One of the main difficulties of this approach is to be able to automatically understand natural language. Our work is directed to produce automatic evaluation of learner summaries in Basque. Therefore, in addition to language comprehension, difficulties emerge from Basque morphology itself. In this work, Latent Semantic Analysis (LSA) is used to model comprehension in a language in which lemmatization has shown to be highly significant. This paper tests the influence of corpus lemmatization while performing automatic comprehension and coherence grading. Summaries graded by human judges in coherence and comprehension, have been tested against LSA based measures from source lemmatized and non-lemmatized corpora. After lemmatization, the amount of LSA known single terms was reduced in a 56% of its original number. As a result, LSA grades almost match human measures, producing no significant differences between the lemmatized and non-lemmatized approaches.