Unsupervised learning of morphology for building lexicon for a highly inflectional language

  • Authors:
  • Utpal Sharma;Jugal Kalita;Rajib Das

  • Affiliations:
  • Tezpur University, Assam, India;University of Colorado, Colorado Springs, CO;Tezpur University, Assam, India

  • Venue:
  • MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Words play a crucial role in aspects of natural language understanding such as syntactic and semantic processing. Usually, a natural language understanding system either already knows the words that appear in the text, or is able to automatically learn relevant information about a word upon encountering it. Usually, a capable system---human or machine, knows a subset of the entire vocabulary of a language and morphological rules to determine attributes of words not seen before. Developing a knowledge base of legal words and morphological rules is an important task in computational linguistics. In this paper, we describe initial experiments following an approach based on unsupervised learning of morphology from a text corpus, especially developed for this purpose. It is a method for conveniently creating a dictionary and a morphology rule base, and is, especially suitable for highly inflectional languages like Assamese. Assamese is a major Indian language of the Indic branch of the Indo-European family of languages. It is used by around 15 million people.