Corpus studies in word prediction

  • Authors:
  • Keith Trnka;Kathleen F. McCoy

  • Affiliations:
  • University of Delaware;University of Delaware

  • Venue:
  • Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word prediction can be used to enhance the communication rate of people with disabilities who use Augmentative and Alternative Communication (AAC) devices. We use statistical methods in a word prediction system, which are trained on a corpus, and then measure the efficacy of the resulting system by calculating the theoretical keystroke savings on some held out data. Ideally training and testing should be done on a large corpus of AAC text covering a variety of topics, but no such corpus exists. We discuss training and testing on a wide variety of corpora meant to approximate text from AAC users. We show that training on a combination of in-domain data with out-of-domain data is often more beneficial than either data set alone and that advanced language modeling such as topic modeling is portable even when applied to very different text.