Data-Driven part-of-speech tagging of kiswahili

  • Authors:
  • Guy De Pauw;Gilles-Maurice de Schryver;Peter W. Wagacha

  • Affiliations:
  • CNTS – Language Technology Group, University of Antwerp, Belgium;African Languages and Cultures, Ghent University, Belgium;School of Computing and Informatics, University of Nairobi, Kenya

  • Venue:
  • TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present experiments with data-driven part-of-speech taggers trained and evaluated on the annotated Helsinki Corpus of Swahili Using four of the current state-of-the-art data-driven taggers, TnT, MBT, SVMTool and MXPOST, we observe the latter as being the most accurate tagger for the Kiswahili dataset.We further improve on the performance of the individual taggers by combining them into a committee of taggers We observe that the more naive combination methods, like the novel plural voting approach, outperform more elaborate schemes like cascaded classifiers and weighted voting This paper is the first publication to present experiments on data-driven part-of-speech tagging for Kiswahili and Bantu languages in general.