Sentence boundary detection in turkish

  • Authors:
  • B. Taner Dinçer;Bahar Karaoğlan

  • Affiliations:
  • Uluslararası Bilgisayar Enstitüsü, Ege Üniversitesi, Bornova, İzmir, Türkiye;Uluslararası Bilgisayar Enstitüsü, Ege Üniversitesi, Bornova, İzmir, Türkiye

  • Venue:
  • ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe a solution method for sentence boundary detection in Turkish. The method exploits simple heuristic knowledge of Turkish syllabication and its phonetic rules for disambiguation of dots. The test accuracy of the algorithm is measured as 96.02%. The main contribution of this study is considered as presenting a new lexicon free method for differentiating EOS (end of sentence) dots from the ones that are used for other purposes.