Context representation using word sequences extracted from a news corpus
International Journal of Approximate Reasoning
Discrete data clustering using finite mixture models
Pattern Recognition
Automatic voice onset time estimation from reassignment spectra
Speech Communication
MAP adaptation of stochastic grammars
Computer Speech and Language
Probabilistic logic with minimum perplexity: Application to language modeling
Pattern Recognition
A bibliographical study of grammatical inference
Pattern Recognition
Discrete visual features modeling via leave-one-out likelihood estimation and applications
Journal of Visual Communication and Image Representation
Natural Language Compression on Edge-Guided text preprocessing
Information Sciences: an International Journal
Lexical access for speech understanding using minimum message length encoding
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
WSDL term tokenization methods for IR-style Web services discovery
Science of Computer Programming
"Then click ok!": extracting references to interface elements in online documentation
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Proceedings of the 34th International Conference on Software Engineering
Improving tweet stream classification by detecting changes in word probability
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
The OpenGrm open-source finite-state grammar software libraries
ACL '12 Proceedings of the ACL 2012 System Demonstrations
A statistical model for unsupervised and semi-supervised transliteration mining
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Language model rest costs and space-efficient storage
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Evaluating the learning curve of domain adaptive statistical machine translation systems
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Improved inference and autotyping in EEG-based BCI typing systems
Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Incremental, predictive parsing with psycholinguistically motivated tree-adjoining grammar
Computational Linguistics
Twitter n-gram corpus with demographic metadata
Language Resources and Evaluation
A dynamic P300-based BCI speller using a language model
International Journal of Advanced Intelligence Paradigms
Hi-index | 754.84 |
Approaches to the zero-frequency problem in adaptive text compression are discussed. This problem relates to the estimation of the likelihood of a novel event occurring. Although several methods have been used, their suitability has been on empirical evaluation rather than a well-founded model. The authors propose the application of a Poisson process model of novelty. Its ability to predict novel tokens is evaluated, and it consistently outperforms existing methods. It is applied to a practical statistical coding scheme, where a slight modification is required to avoid divergence. The result is a well-founded zero-frequency model that explains observed differences in the performance of existing methods, and offers a small improvement in the coding efficiency of text compression over the best method previously known