Stepwise mining of multi-word expressions in Hindi

  • Authors:
  • R. Mahesh K. Sinha

  • Affiliations:
  • Indian Institute of Technology, Kanpur, India

  • Venue:
  • MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multi-word expressions (MWEs) play an important role in all tasks that involve natural language processing. MWEs in Hindi are quite varied and many of these are of the types that are not encountered in English. In this paper, we examine different types of MWEs encountered in Hindi. Many of these have not received adequate attention of investigators. For example, 'vaalaa' constructs, doublets (word-pairs), replication, and a variety of verb group forms have not been explored as MWEs. We examine these MWEs from machine translation viewpoint. Many of these are frequently used in day-to-day conversations and informal communication but are not that frequently encountered in a formal textual corpus. Most of the conventional statistical methods for MWE identification use corpus with limited linguistic cues. These are found to be inadequate for detecting all types of MWEs that exist in real life. In this paper, we present a stepwise methodology for mining Hindi MWEs using linguistic knowledge. Interpretation and representation for some of these from machine translation perspective have also been explored.