Using an ontology for improved automated content scoring of spontaneous non-native speech

Authors:
Miao Chen;Klaus Zechner
Affiliations:
Syracuse University, Syracuse, NY;Educational Testing Service, Princeton, NJ
Venue:
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Year:
2012

Citing 6
Cited 0

A vector space model for automatic indexing

Communications of the ACM
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Automatic scoring of non-native spontaneous speech in tests of spoken English

Speech Communication
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Towards automatic scoring of a test of spoken language with heterogeneous task types

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an exploration into automated content scoring of non-native spontaneous speech using ontology-based information to enhance a vector space approach. We use content vector analysis as a baseline and evaluate the correlations between human rater proficiency scores and two cosine-similarity-based features, previously used in the context of automated essay scoring. We use two ontology-facilitated approaches to improve feature correlations by exploiting the semantic knowledge encoded in WordNet: (1) extending word vectors with semantic concepts from the WordNet ontology (synsets); and (2) using a reasoning approach for estimating the concept weights of concepts not present in the set of training responses by exploiting the hierarchical structure of WordNet. Furthermore, we compare features computed from human transcriptions of spoken responses with features based on output from an automatic speech recognizer. We find that (1) for one of the two features, both ontologically based approaches improve average feature correlations with human scores, and that (2) the correlations for both features decrease only marginally when moving from human speech transcriptions to speech recognizer output.