Identification of reduplicated multiword expressions using CRF

  • Authors:
  • Kishorjit Nongmeikapam;Dhiraj Laishram;Naorem Bikramjit Singh;Ngariyanbam Mayekleima Chanu;Sivaji Bandyopadhyay

  • Affiliations:
  • Dept. of Computer Sc. & Engg., Manipur Institute of Technology, Manipur University, Imphal, India;Dept. of Computer Sc. & Engg., Manipur Institute of Technology, Manipur University, Imphal, India;Dept. of Computer Sc. & Engg., Manipur Institute of Technology, Manipur University, Imphal, India;Dept. of Education Technology, Kanan Devi Memorial College of Education, Imphal, India;Dept. of Computer Sc. & Engg., Jadavpur University, Jadavpur, Kolkata, India

  • Venue:
  • CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper deals with the identification of Reduplicated Multiword Expressions (RMWEs) which is important for any natural language applications like Machine Translation, Information Retrieval etc. In the present task, reduplicated MWEs have been identified in Manipuri language texts using CRF tool. Manipuri is highly agglutinative in nature and reduplication is quite high in this language. The important features selected for running the CRF tool include stem words, number of suffixes, number of prefixes, prefixes in the word, suffixes in the word, Part Of Speech (POS) of the surrounding words, surrounding stem words, length of the word, word frequency and digit feature. Experimental results show the effectiveness of the proposed approach with the overall average Recall, Precision and F-Score values of 92.91%, 91.90% and 92.40% respectively.