Analysing performance in a word prediction system with multiple prediction methods

  • Authors:
  • Pertti Alvar Väyrynen;Kai Noponen;Tapio Seppänen

  • Affiliations:
  • Oulun yliopisto Sähkö-ja tietotekniikan osasto, Tietokonetekniikan laboratorio, PL 4500, 90014 Oulun yliopisto, Finland;Oulun yliopisto Sähkö-ja tietotekniikan osasto, Tietokonetekniikan laboratorio, PL 4500, 90014 Oulun yliopisto, Finland;Oulun yliopisto Sähkö-ja tietotekniikan osasto, Tietokonetekniikan laboratorio, PL 4500, 90014 Oulun yliopisto, Finland

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, we investigate the performance of a hybrid prediction system with a phrase prediction utility in English word prediction from two viewpoints. From the application user's point of view, measures of effort savings are important in word prediction. Global performance measures such as the average percentage of keystroke or character savings, however, hide rather than display the details of the functioning of the prediction system as a whole. In the present study, we analysed in detail the performance of a prediction system with a phrase prediction utility along with single word prediction. Our preliminary results with a corpus of 383 lexical bundles show that, from a technological viewpoint, the following three parameters affect the practical utility of the phrase prediction method in a hybrid prediction system: (1) cost of selecting an appropriate prediction mode for single word prediction and phrase prediction; (2) token frequency of phrases in the text predicted, and (3) coverage of the phrasal lexicon. We found that all three affect the phrase prediction performance in different proportions. When the percent of ambiguous search keys finding both phrases and single words is 20%, phrase frequency 35%, and coverage of the phrasal lexicon 98%, the character savings percentage for the whole text will be improved by 6% points under optimal conditions. The system is practically useful as long as an appropriate prediction mode can be selected automatically or the cost of disambiguation of a prediction mode is not too high.