Towards automatic scoring of non-native spontaneous speech

Authors:
Klaus Zechner;Isaac I. Bejar
Affiliations:
Educational Testing Service, Princeton, NJ;Educational Testing Service, Princeton, NJ
Venue:
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Year:
2006

Citing 1
Cited 5

The nature of statistical learning theory

The nature of statistical learning theory

Automatic assessment of oral language proficiency and listening comprehension

Speech Communication
Automatic scoring of non-native spontaneous speech in tests of spoken English

Speech Communication
Improved pronunciation features for construct-driven assessment of non-native spontaneous speech

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Towards automatic scoring of a test of spoken language with heterogeneous task types

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Automated opinion detection: Implications of the level of agreement between human raters

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the feasibility of automated scoring of spoken English proficiency of non-native speakers. Unlike existing automated assessments of spoken English, our data consists of spontaneous spoken responses to complex test items. We perform both a quantitative and a qualitative analysis of these features using two different machine learning approaches. (1) We use support vector machines to produce a score and evaluate it with respect to a mode baseline and to human rater agreement. We find that scoring based on support vector machines yields accuracies approaching inter-rater agreement in some cases. (2) We use classification and regression trees to understand the role of different features and feature classes in the characterization of speaking proficiency by human scorers. Our analysis shows that across all the test items most or all the feature classes are used in the nodes of the trees suggesting that the scores are, appropriately, a combination of multiple components of speaking proficiency. Future research will concentrate on extending the set of features and introducing new feature classes to arrive at a scoring model that comprises additional relevant aspects of speaking proficiency.