Integrating morphology with multi-word expression processing in Turkish

  • Authors:
  • Kemal Oflazer;Özlem çetinoğlu;Bilge Say

  • Affiliations:
  • Sabanci University, Istanbul, Turkey;Sabanci University, Istanbul, Turkey;Middle East Technical University, Ankara, Turkey

  • Venue:
  • MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a multi-word expression processor for preprocessing Turkish text for various language engineering applications. In addition to the fairly standard set of lexicalized collocations and multi-word expressions such as named-entities, Turkish uses a quite wide range of semi-lexicalized and non-lexicalized collocations. After an overview of relevant aspects of Turkish, we present a description of the multi-word expressions we handle. We then summarize the computational setting in which we employ a series of components for tokenization, morphological analysis, and multi-word expression extraction. We finally present results from runs over a large corpus and a small gold-standard corpus.