A Comparison of Data-Driven Automatic Syllabification Methods

  • Authors:
  • Connie R. Adsett;Yannick Marchand

  • Affiliations:
  • Faculty of Computer Science, Dalhousie University, Halifax, Canada B3H 1W5 and Institute for Biodiagnostics (Atlantic), National Research Council Canada, Halifax, Canada B3H 3A7;Faculty of Computer Science, Dalhousie University, Halifax, Canada B3H 1W5 and Institute for Biodiagnostics (Atlantic), National Research Council Canada, Halifax, Canada B3H 3A7

  • Venue:
  • SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although automatic syllabification is an important component in several natural language tasks, little has been done to compare the results of data-driven methods on a wide range of languages. This article compares the results of five data-driven syllabification algorithms (Hidden Markov Support Vector Machines, IB1, Liang's algorithm, the Look Up Procedure, and Syllabification by Analogy) on nine European languages in order to determine which algorithm performs best over all. Findings show that all algorithms achieve a mean word accuracy across all lexicons of over 90%. However, Syllabification by Analogy performs better than the other algorithms tested with a mean word accuracy of 96.84% (standard deviation of 2.93) whereas Liang's algorithm, the standard for hyphenation (used in $\mbox\TeX$), produces the second best results with a mean of 95.67% (standard deviation of 5.70).