Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian

  • Authors:
  • Connie R. Adsett;Yannick Marchand;Vlado Keselj

  • Affiliations:
  • Institute for Biodiagnostics (Atlantic), National Research Council Canada, 1796 Summer Street, Suite 3900, Halifax, Nova Scotia, Canada B3H 3A7 and Faculty of Computer Science, Dalhousie Universit ...;Institute for Biodiagnostics (Atlantic), National Research Council Canada, 1796 Summer Street, Suite 3900, Halifax, Nova Scotia, Canada B3H 3A7 and Faculty of Computer Science, Dalhousie Universit ...;Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1W5

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Linguistic rules have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been challenged for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this study, three rule-based automatic syllabification systems and two data-driven automatic syllabification systems (Syllabification by Analogy and the Look-Up Procedure) are compared on a language with lower syllabic complexity - Italian. Comparing the performance using a lexicon containing 44,720 words, the best data-driven algorithm (Syllabification by Analogy) achieved 97.70% word accuracy while the best rule set correctly syllabified 89.77% words. These results show that data-driven methods can also outperform rule-based methods on Italian syllabification, a language of low syllabic complexity.