A general-purpose compression scheme for large collections
ACM Transactions on Information Systems (TOIS)
Implementing the Context Tree Weighting Method for Text Compression
DCC '00 Proceedings of the Conference on Data Compression
DCC '00 Proceedings of the Conference on Data Compression
Universal Text Preprocessing for Data Compression
IEEE Transactions on Computers
Revisiting dictionary-based compression: Research Articles
Software—Practice & Experience
Designing for uncertain, asymmetric control: Interaction design for brain-computer interfaces
International Journal of Human-Computer Studies
Scaling high-order character language models to gigabytes
Software '05 Proceedings of the Workshop on Software
A note on brain actuated spelling with the Berlin brain-computer interface
UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: ambient interaction
Natural Language Processing (Almost) from Scratch
The Journal of Machine Learning Research
Comparing entropies within the chinese language
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Revisiting the predictability of language: response completion in social media
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
The purpose of this paper is to show that the difference between the best machine models and human models is smaller than might be indicated by the previous results. This follows from a number of observations: firstly, the original human experiments used only 27 character English (letters plus space) against full 128 character ASCII text for most computer experiments; secondly, using large amounts of priming text substantially improves the PPM's performance; and thirdly, the PPM algorithm can be modified to perform better for English text. The result of this is a machine performance down to 1.46 bit per character. The problem of estimating the entropy of English is discussed. The importance of training text for PPM is demonstrated, showing that its performance can be improved by "adjusting" the alphabet used. The results based on these improvements are then given, with compression down to 1.46 bpc.