Recursive decompounding in Afrikaans

  • Authors:
  • Tilla Fick;Chris Swanepoel

  • Affiliations:
  • Department of Decision Sciences, University of South Africa, Pretoria, South Africa;Department of Decision Sciences, University of South Africa, Pretoria, South Africa

  • Venue:
  • TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

An algorithm has been developed to decompose compound words in Afrikaans. This data driven technique recursively uses an extensive list of Afrikaans words in the decompounding process. String fitting from the beginning and end of words forms the basis of the process, while sublists containing short words that may occur only at the beginning or end of words, and lists of prefixes and suffixes are utilised. Applying the algorithm to the original lexicon of 182 433 words resulted in accuracy of 90,2%, precision of 99,9% and recall of 83,6%.