Exploring a corpus-based approach for detecting language impairment in monolingual English-speaking children

Authors:
Keyur Gabani;Thamar Solorio;Yang Liu;Khairun-nisa Hassanali;Christine A. Dollaghan
Affiliations:
Department of Computer Science, The University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA;Department of Computer and Information Sciences, The University of Alabama at Birmingham, 1300 Univ. Blvd., Birmingham, AL 35294, USA;Department of Computer Science, The University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA;Department of Computer Science, The University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA;Department of Communication Sciences and Disorders, The University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA
Venue:
Artificial Intelligence in Medicine
Year:
2011

Citing 5
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Using language models to identify language impairment in Spanish-English bilingual children

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
A corpus-based approach for the prediction of language impairment in monolingual English and Spanish-English bilingual children

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Analyzing language samples of spanish-english bilingual children for the automated prediction of language dominance

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objectives: This paper explores the use of an automated method for analyzing narratives of monolingual English speaking children to accurately predict the presence or absence of a language impairment. The goal is to exploit corpus-based approaches inspired by the fields of natural language processing and machine learning. Methods and materials: We extract a large variety of features from language samples and use them to train language models and well known machine learning algorithms as the underlying predictors. The methods are evaluated on two different datasets and three language tasks. One dataset contains samples of two spontaneous narrative tasks performed by 118 children with an average age of 13 years and a second dataset contains play sessions from over 600 younger children with an average age of 6 years. Results: We compare results against a cut off baseline method and show that our results are far superior, reaching F-measures of over 85% in two of the three language tasks, and 48% in the third one. Conclusions: The different experiments we present here show that corpus based approaches can yield good prediction results in the problem of language impairment detection. These findings warrant further exploration of natural language processing techniques in the field of communication disorders. Moreover, the proposed framework can be easily adapted to analyze samples in languages other than English since most of the features are language independent or can be customized with little effort.