Wordform- and class-based prediction of the components of German nominal compounds in an AAC system

Authors:
Marco Baroni;Johannes Matiasek;Harald Trost
Affiliations:
Austrian Research Institute for Artificial Intelligence, Vienna, Austria;Austrian Research Institute for Artificial Intelligence, Vienna, Austria;University of Vienna, Vienna, Austria
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 2
Cited 5

Class-based n-gram models of natural language

Computational Linguistics
Methods for the qualitative evaluation of lexical association measures

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Towards an adaptive communication aid with text input from ambiguous keyboards

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Sibylle, An Assistive Communication System Adapting to the Context and Its User

ACM Transactions on Accessible Computing (TACCESS)
Exploiting long distance collocational relations in predictive typing

TextEntry '03 Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods
EMU – a european multilingual text prediction software

ICCHP'06 Proceedings of the 10th international conference on Computers Helping People with Special Needs

Quantified Score

Hi-index	0.00

Visualization

Abstract

In word prediction systems for augmentative and alternative communication (AAC), productive word-formation processes such as compounding pose a serious problem. We present a model that predicts German nominal compounds by splitting them into their modifier and head components, instead of trying to predict them as a whole. The model is improved further by the use of class-based modifier-head bigrams constructed using semantic classes automatically extracted from a corpus. The evaluation shows that the split compound model with class bigrams leads to an improvement in keystroke savings of more than 15% over a no split compound baseline model. We also present preliminary results obtained with a word prediction model integrating compound and simple word prediction.