A Search Engine for Morphologically Complex Languages

  • Authors:
  • Udo Hahn;Martin Honeck;Stefan Schulz

  • Affiliations:
  • -;-;-

  • Venue:
  • IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document retrieval on natural languages with a rich morphology -- particularly in terms of derivation and (single-word) composition -- suffers from serious performance degradation with the direct query-term-to-text-word matching paradigm that underlies the vast majority of current search engines. We propose an alternative approach in which morphologically complex word forms, which appear in the query as well as in the documents, are segmented into relevant subwords (such as stems, named entities, acronyms) and are subsequently submitted to the matching procedure. We evaluate our approach with the Alta Vista驴 Search Engine on a large medical document collection.