A Data mining approach for resolving cases of Multiple Parsing in Machine Aided Translation of Indian Languages

  • Authors:
  • S. D. Samantaray

  • Affiliations:
  • College of Technology Pantnagar

  • Venue:
  • ITNG '07 Proceedings of the International Conference on Information Technology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Resolving cases of multiple parsing is one of the biggest problems in Machine Aided Translation (MAT) systems. Producing an unambiguous parse is a major challenge for the parsers developed for Indian Languages. The paper discusses a Data Mining based approach for development of Vector Charts that could be used for resolving multiple Parses and nounverb attachment problems for correct translation of Indian Languages. In this approach using the text corpus, a vibhakti-chart (or subcategorization frames) is obtained. This vibhakti-chart plays a pivotal role in defining Association Rules that could be used more effectively for multiple parse disambiguation. The vibhakti-chart can be used further for finding agreement rules for verb and noun-groups with proper vibhaktis, for finding how many noun-groups have ambiguous grouping with or without vibhakti-charts, and also to know whether verbs obtained be grouped into classes. An Indian language text corpus has been used for the work. It is a collection of more than 250 stories, and the total size is about 100000 words. Anusaaraka (A language follower) has been used to perform morphological analysis of given corpus, and to obtain vibhaktis of noun. In the process, several filters had been developed for necessary data cleaning.