IGTree: Using Trees for Compression and Classification in Lazy LearningAlgorithms
Artificial Intelligence Review - Special issue on lazy learning
FASTY - A Multi-lingual Approach to Text Prediction
ICCHP '02 Proceedings of the 8th International Conference on Computers Helping People with Special Needs
Text prediction systems: a survey
Universal Access in the Information Society
Re-phrase: chat-by-click: a fundamental new mode of human communication over the internet
CHI '08 Extended Abstracts on Human Factors in Computing Systems
Efficient context-sensitive word completion for mobile devices
Proceedings of the 10th international conference on Human computer interaction with mobile devices and services
Testing the efficacy of part-of-speech information in word completion
TextEntry '03 Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods
Information interaction in 140 characters or less: genres on twitter
Proceedings of the third symposium on Information interaction in context
Hi-index | 0.00 |
Text prediction is the task of suggesting text while the user is typing. Its main aim is to reduce the number of keystrokes that are needed to type a text. In this paper, we address the influence of text type and domain differences on text prediction quality. By training and testing our text prediction algorithm on four different text types (Wikipedia, Twitter, transcriptions of conversational speech and FAQ) with equal corpus sizes, we found that there is a clear effect of text type on text prediction quality: training and testing on the same text type gave percentages of saved keystrokes between 27 and 34%; training on a different text type caused the scores to drop to percentages between 16 and 28%. In our case study, we compared a number of training corpora for a specific data set for which training data is sparse: questions about neurological issues. We found that both text type and topic domain play a role in text prediction quality. The best performing training corpus was a set of medical pages from Wikipedia. The second-best result was obtained by leave-one-out experiments on the test questions, even though this training corpus was much smaller (2,672 words) than the other corpora (1.5 Million words).