Computation of the probability of initial substring generation by stochastic context-free grammars
Computational Linguistics
A Mathematical Theory of Communication
A Mathematical Theory of Communication
From dirt to shovels: fully automatic tool generation from ad hoc data
Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Voting experts: An unsupervised algorithm for segmenting sequences
Intelligent Data Analysis
An efficient algorithm for unsupervised word segmentation with branching entropy and MDL
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
A statistical language model may be used to segment a data sequence by thresholding its instantaneous entropy. In this paper we describe how this process works, and we apply it to the problem of discovering separator symbols in a text. Our results show that language models which bootstrap themselves with structure found in this way undergo a reduction in perplexity. We conclude that these techniques may be useful in the design of generic grammatical inference systems.