A Bayesian mixture model for term re-occurrence and burstiness
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Hi-index | 0.03 |
We study the distributions of distances between identical elements of a random sequence (e.g. a sequence of coin tosses or die tosses). We provide methods to generate observations by means of a statistical simulation and show in particular that distributions of multiple distances obey a linear or geometric (mixture) probability model, respectively. The results are useful to discover certain structures in texts or other information strings.