Automatic scoring of non-native spontaneous speech in tests of spoken English

Authors:
Klaus Zechner;Derrick Higgins;Xiaoming Xi;David M. Williamson
Affiliations:
Educational Testing Service, Automated Scoring and NLP, Rosedale Road, MS 11-R, Princeton, NJ 08541, USA;Educational Testing Service, Automated Scoring and NLP, Rosedale Road, MS 11-R, Princeton, NJ 08541, USA;Educational Testing Service, Automated Scoring and NLP, Rosedale Road, MS 11-R, Princeton, NJ 08541, USA;Educational Testing Service, Automated Scoring and NLP, Rosedale Road, MS 11-R, Princeton, NJ 08541, USA
Venue:
Speech Communication
Year:
2009

Citing 3
Cited 11

Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms

Speech Communication
Combination of machine scores for automatic grading of pronunciation quality

Speech Communication
Towards automatic scoring of non-native spontaneous speech

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

A three-stage approach to the automated scoring of spontaneous spoken responses

Computer Speech and Language
Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Non-scorable response detection for automated speaking proficiency assessment

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Non-English response detection method for automated proficiency scoring system

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Automatic scoring on english passage reading quality

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Exploring content features for automated speech scoring

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Using ontology-based approaches to representing speech transcripts for automated speech scoring

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Utilizing cumulative logit models and human computation on automated speech assessment

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Using an ontology for improved automated content scoring of spontaneous non-native speech

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Vocabulary profile as a measure of vocabulary sophistication

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Assessment of ESL learners' syntactic competence based on similarity measures

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the first version of the SpeechRater^S^M system for automatically scoring non-native spontaneous high-entropy speech in the context of an online practice test for prospective takers of the Test of English as a Foreign Language^(R) internet-based test (TOEFL^(R) iBT). The system consists of a speech recognizer trained on non-native English speech data, a feature computation module, using speech recognizer output to compute a set of mostly fluency based features, and a multiple regression scoring model which predicts a speaking proficiency score for every test item response, using a subset of the features generated by the previous component. Experiments with classification and regression trees (CART) complement those performed with multiple regression. We evaluate the system both on TOEFL Practice data [TOEFL Practice Online (TPO)] as well as on Field Study data collected before the introduction of the TOEFL iBT. Features are selected by test development experts based on both their empirical correlations with human scores as well as on their coverage of the concept of communicative competence. We conclude that while the correlation between machine scores and human scores on TPO (of 0.57) still differs by 0.17 from the inter-human correlation (of 0.74) on complete sets of six items (Pearson r correlation coefficients), the correlation of 0.57 is still high enough to warrant the deployment of the system in a low-stakes practice environment, given its coverage of several important aspects of communicative competence such as fluency, vocabulary diversity, grammar, and pronunciation. Another reason why the deployment of the system in a low-stakes practice environment is warranted is that this system is an initial version of a long-term research and development program where features related to vocabulary, grammar, and content will be added in a later stage when automatic speech recognition performance improves, which can then be easily achieved without a re-design of the system. Exact agreement on single TPO items between our system and human scores was 57.8%, essentially at par with inter-human agreement of 57.2%. Our system has been in operational use to score TOEFL Practice Online Speaking tests since the Fall of 2006 and has since scored tens of thousands of tests.