From Plain Character Strings to Meaningful Words: Producing Better Full Text Databases for Inflectional and Compounding Languages with Morphological Analysis Software

  • Authors:
  • Riitta Alkula

  • Affiliations:
  • Tieto Enator Corporation, Finland

  • Venue:
  • Information Retrieval
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper deals with linguistic processing and retrieval techniques in fulltext databases. Special attention is focused on the characteristics of highly inflectional languages, and how morphological structure of a language should be taken into account, when designing and developing information retrieval systems. Finnish is used as an example of a language, which has a more complicated inflectional structure than the English language. In the FULLTEXT project, natural language analysis modules for Finnish were incorporated into the commercial BASIS information retrieval system, which is based on inverted files and Boolean searching. Several test databases were produced, each using one or two Finnish morphological analysis programs.