A regularized compression method to unsupervised word segmentation

  • Authors:
  • Ruey-Cheng Chen;Chiung-Min Tsai;Jieh Hsiang

  • Affiliations:
  • National Taiwan University, Taipei, Taiwan;National Taiwan University, Taipei, Taiwan;National Taiwan University, Taipei, Taiwan

  • Venue:
  • SIGMORPHON '12 Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Languages are constantly evolving through their users due to the need to communicate more efficiently. Under this hypothesis, we formulate unsupervised word segmentation as a regularized compression process. We reduce this process to an optimization problem, and propose a greedy inclusion solution. Preliminary test results on the Bernstein-Ratner corpus and Bakeoff-2005 show that the our method is comparable to the state-of-the-art in terms of effectiveness and efficiency.