Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech

Authors:
Miao Chen;Klaus Zechner
Affiliations:
Syracuse University, Syracuse, NY;Educational Testing Service, Princeton, NJ
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 4
Cited 6

Combination of machine scores for automatic grading of pronunciation quality

Speech Communication
Automatic scoring of non-native spontaneous speech in tests of spoken English

Speech Communication
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Towards using structural events to assess non-native speech

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications

Exploring content features for automated speech scoring

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Utilizing cumulative logit models and human computation on automated speech assessment

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
On improving the accuracy of readability classification using insights from second language acquisition

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Vocabulary profile as a measure of vocabulary sophistication

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Automatically learning measures of child language development

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Assessment of ESL learners' syntactic competence based on similarity measures

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on identifying, extracting and evaluating features related to syntactic complexity of spontaneous spoken responses as part of an effort to expand the current feature set of an automated speech scoring system in order to cover additional aspects considered important in the construct of communicative competence. Our goal is to find effective features, selected from a large set of features proposed previously and some new features designed in analogous ways from a syntactic complexity perspective that correlate well with human ratings of the same spoken responses, and to build automatic scoring models based on the most promising features by using machine learning methods. On human transcriptions with manually annotated clause and sentence boundaries, our best scoring model achieves an overall Pearson correlation with human rater scores of r=0.49 on an unseen test set, whereas correlations of models using sentence or clause boundaries from automated classifiers are around r=0.2.