A maximum entropy approach to natural language processing
Computational Linguistics
Computer
An overview of spoken language technology for education
Speech Communication
Automatic scoring of non-native spontaneous speech in tests of spoken English
Speech Communication
Improved pronunciation features for construct-driven assessment of non-native spontaneous speech
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Open-source machine learning: R meets Weka
Computational Statistics - Proceedings of DSC 2007
Human computation: a survey and taxonomy of a growing field
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Detecting structural events for assessing non-native speech
IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Pronunciation feature extraction
PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition
Hi-index | 0.00 |
We report two new approaches for building scoring models used by automated speech scoring systems. First, we introduce the Cumulative Logit Model (CLM), which has been widely used in modeling categorical outcomes in statistics. On a large set of responses to an English proficiency test, we systematically compare the CLM with two other scoring models that have been widely used, i.e., linear regression and decision trees. Our experiments suggest that the CLM has advantages in its scoring performance and its robustness to limited-sized training data. Second, we propose a novel way to utilize human rating processes in automated speech scoring. Applying accurate human ratings on a small set of responses can improve the whole scoring system's performance while meeting cost and score-reporting time requirements. We find that the scoring difficulty of each speech response, which could be modeled by the degree to which it challenged human raters, could provide a way to select an optimal set of responses for the application of human scoring. In a simulation, we show that focusing on challenging responses can achieve a larger scoring performance improvement than simply applying human scoring on the same number of randomly selected responses.