Text compression
Elements of information theory
Elements of information theory
Improving statistical language model performance with automatically generated word hierarchies
Computational Linguistics
DCC '00 Proceedings of the Conference on Data Compression
Lexical Post-Processing Optimization for Handwritten Word Recognition
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Introduction to the special issue on computational linguistics using large corpora
Computational Linguistics - Special issue on using large corpora: I
New techniques for context modeling
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Measures and models for phrase recognition
HLT '93 Proceedings of the workshop on Human Language Technology
Segmenting documents by stylistic character
Natural Language Engineering
Predicting sentences using N-gram language models
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Cross-entropy and linguistic typology
NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Designing for uncertain, asymmetric control: Interaction design for brain-computer interfaces
International Journal of Human-Computer Studies
Modeling morphologically rich languages using split words and unstructured dependencies
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
An estimate method of the minimum entropy of natural languages
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Multi-style language model for web scale information retrieval
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Improving mention detection robustness to noisy input
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Peddling or creating? investigating the role of twitter in news reporting
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Information Sciences: an International Journal
Natural Language Processing (Almost) from Scratch
The Journal of Machine Learning Research
Comparing entropies within the chinese language
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Statistical behavior analysis of smoothing methods for language models of mandarin data sets
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Methods for combining statistical models of music
CMMR'04 Proceedings of the Second international conference on Computer Music Modeling and Retrieval
A bayesian model for learning SCFGs with discontiguous rules
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Revisiting the predictability of language: response completion in social media
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
An information-theoretic measure to evaluate parsing difficulty across treebanks
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
We present an estimate of an upper bound of 1.75 bits for the entropy of characters in printed English, obtained by constructing a word trigram model and then computing the cross-entropy between this model and a balanced sample of English text. We suggest the well-known and widely available Brown Corpus of printed English as a standard against which to measure progress in language modeling and offer our bound as the first of what we hope will be a series of steadily decreasing bounds.