Utilizing cumulative logit models and human computation on automated speech assessment

Authors:
Lei Chen
Affiliations:
Educational Testing Service (ETS), Princeton, NJ
Venue:
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Year:
2012

Citing 11
Cited 0

A maximum entropy approach to natural language processing

Computational Linguistics
Games with a Purpose

Computer
An overview of spoken language technology for education

Speech Communication
Automatic scoring of non-native spontaneous speech in tests of spoken English

Speech Communication
Improved pronunciation features for construct-driven assessment of non-native spontaneous speech

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Open-source machine learning: R meets Weka

Computational Statistics - Proceedings of DSC 2007
Human computation: a survey and taxonomy of a growing field

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Detecting structural events for assessing non-native speech

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Pronunciation feature extraction

PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report two new approaches for building scoring models used by automated speech scoring systems. First, we introduce the Cumulative Logit Model (CLM), which has been widely used in modeling categorical outcomes in statistics. On a large set of responses to an English proficiency test, we systematically compare the CLM with two other scoring models that have been widely used, i.e., linear regression and decision trees. Our experiments suggest that the CLM has advantages in its scoring performance and its robustness to limited-sized training data. Second, we propose a novel way to utilize human rating processes in automated speech scoring. Applying accurate human ratings on a small set of responses can improve the whole scoring system's performance while meeting cost and score-reporting time requirements. We find that the scoring difficulty of each speech response, which could be modeled by the degree to which it challenged human raters, could provide a way to select an optimal set of responses for the application of human scoring. In a simulation, we show that focusing on challenging responses can achieve a larger scoring performance improvement than simply applying human scoring on the same number of randomly selected responses.