Finding structure via compression

  • Authors:
  • Jason L. Hutchens;Michael D. Alder

  • Affiliations:
  • University of Western Australia, Nedlands W.A., Australia;University of Western Australia, Nedlands W.A., Australia

  • Venue:
  • NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

A statistical language model may be used to segment a data sequence by thresholding its instantaneous entropy. In this paper we describe how this process works, and we apply it to the problem of discovering separator symbols in a text. Our results show that language models which bootstrap themselves with structure found in this way undergo a reduction in perplexity. We conclude that these techniques may be useful in the design of generic grammatical inference systems.