Improving word segmentation by simultaneously learning phonotactics

  • Authors:
  • Daniel Blanchard;Jeffrey Heinz

  • Affiliations:
  • University of Delaware;University of Delaware

  • Venue:
  • CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The most accurate unsupervised word segmentation systems that are currently available (Brent, 1999; Venkataraman, 2001; Goldwater, 2007) use a simple unigram model of phonotactics. While this simplifies some of the calculations, it overlooks cues that infant language acquisition researchers have shown to be useful for segmentation (Mattys et al., 1999; Mattys and Jusczyk, 2001). Here we explore the utility of using bigram and trigram phono-tactic models by enhancing Brent's (1999) MBDP-1 algorithm. The results show the improved MBDP-Phon model outperforms other unsupervised word segmentation systems (e.g., Brent, 1999; Venkataraman, 2001; Goldwater, 2007).