Observing lemmatization effect in LSA coherence and comprehension grading of learner summaries

  • Authors:
  • Iraide Zipitria;Ana Arruarte;Jon Ander Elorriaga

  • Affiliations:
  • Language and Information Systems Department, Computer Science Faculty, University of the Basque Country, Donostia, Basque Country, Spain;Language and Information Systems Department, Computer Science Faculty, University of the Basque Country, Donostia, Basque Country, Spain;Language and Information Systems Department, Computer Science Faculty, University of the Basque Country, Donostia, Basque Country, Spain

  • Venue:
  • ITS'06 Proceedings of the 8th international conference on Intelligent Tutoring Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current work in learner evaluation of Intelligent Tutoring Systems (ITSs), is moving towards open-ended educational content diagnosis. One of the main difficulties of this approach is to be able to automatically understand natural language. Our work is directed to produce automatic evaluation of learner summaries in Basque. Therefore, in addition to language comprehension, difficulties emerge from Basque morphology itself. In this work, Latent Semantic Analysis (LSA) is used to model comprehension in a language in which lemmatization has shown to be highly significant. This paper tests the influence of corpus lemmatization while performing automatic comprehension and coherence grading. Summaries graded by human judges in coherence and comprehension, have been tested against LSA based measures from source lemmatized and non-lemmatized corpora. After lemmatization, the amount of LSA known single terms was reduced in a 56% of its original number. As a result, LSA grades almost match human measures, producing no significant differences between the lemmatized and non-lemmatized approaches.