Evaluation of automatic break insertion for an agglutinative and inflected language

  • Authors:
  • Eva Navas;Inmaculada Hernáez;Iñaki Sainz

  • Affiliations:
  • Departamento de Electrónica y Telecomunicaciones, University of the Basque Country, Alda. Urquijo s/n, 48013 Bilbao, Spain;Departamento de Electrónica y Telecomunicaciones, University of the Basque Country, Alda. Urquijo s/n, 48013 Bilbao, Spain;Departamento de Electrónica y Telecomunicaciones, University of the Basque Country, Alda. Urquijo s/n, 48013 Bilbao, Spain

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the evaluation of automatic break insertion for standard Basque. Basque is an agglutinative and inflected language and POS features, widely used for other languages, are not enough to accurately predict the insertion of breaks in the text. Other morpho-syntactic features, like grammatical case and information about syntagms have also been taken into account. With a textual corpus specially gathered for this study where the sentence internal punctuation marks have been removed, CARTs have been used to predict break locations. After applying parameter selection to the whole morpho-syntactic feature set, the best features were employed to build two CARTs, one that gives the same importance to deletion and insertion errors, T1, and another one, T2, that tries to minimize insertion errors. The objective evaluation of the break insertion algorithms gives a @k statistic of 0.518 and an F of 0.757 for T1 tree. The algorithms have also been subjectively evaluated and although T1 had better objective measures, the number of serious errors made by this tree is larger than the number of serious errors made by T2.