Simple Morpheme Labelling in Unsupervised Morpheme Analysis

  • Authors:
  • Delphine Bernhard

  • Affiliations:
  • Ubiquitous Knowledge Processing Lab Computer Science Department, Technische Universität Darmstadt, Germany

  • Venue:
  • Advances in Multilingual and Multimodal Information Retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a system for unsupervised morpheme analysis and the results it obtained at Morpho Challenge 2007. The system takes a plain list of words as input and returns a list of labelled morphemic segments for each word. Morphemic segments are obtained by an unsupervised learning process which can directly be applied to different natural languages. Results obtained at competition 1 (evaluation of the morpheme analyses) are better in English, Finnish and German than in Turkish. For information retrieval (competition 2), the best results are obtained when indexing is performed using Okapi (BM25) weighting for all morphemes minus those belonging to an automatic stop list made of the most common morphemes.