Measuring improvement in latent semantic analysis-based marking systems: using a computer to mark questions about HTML

Authors:
Debra T. Haley;Pete Thomas;Anne De Roeck;Marian Petre
Affiliations:
The Open University, Walton Hall, Milton Keynes, UK;The Open University, Walton Hall, Milton Keynes, UK;The Open University, Walton Hall, Milton Keynes, UK;The Open University, Walton Hall, Milton Keynes, UK
Venue:
ACE '07 Proceedings of the ninth Australasian conference on Computing education - Volume 66
Year:
2007

Citing 6
Cited 0

Information retrieval using a singular value decomposition model of latent semantic structure

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Using latent semantic indexing for information filtering

COCS '90 Proceedings of the ACM SIGOIS and IEEE CS TC-OA conference on Office information systems
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Statistical and linguistic strategies in the computer grading of essays

COLING '67 Proceedings of the 1967 conference on Computational linguistics
Statistics Without Maths for Psychology: Using Spss for Windows

Statistics Without Maths for Psychology: Using Spss for Windows
Automatic evaluation of students' answers using syntactically enhanced LSA

HLT-NAACL-EDUC '03 Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes two unconventional metrics as an important tool for assessment research: the Manhattan (L1) and the Euclidean (L2) distance measures. We used them to evaluate the results of a Latent Semantic Analysis (LSA) system to assess short answers to two questions about HTML in an introductory computer science class. This is the only study, as far as we know, that addresses the question of how well an LSA-based system can evaluate answers in the very specific and technical language of HTML. We found that, although there are several ways to measure automatic assessment results in the literature, they were not useful for our purposes. We want to compare the marks given by LSA to marks awarded by a human tutor. We demonstrate how L1 and L2 quantify the results of varying the amount of training data necessary to enable LSA to mark the answers to two HTML questions. Although this paper describes the use of the metrics in one particular case, it has more general applicability. Much fine-tuning of an LSA marking system is required for good results. A researcher needs an easy way to evaluate the results of various modifications to the system. The Manhattan and the Euclidean distance measures provide this functionality.