SVM Based Part of Speech Tagger for Malayalam

  • Authors:
  • Antony P.J;Santhanu P. Mohan;Soman K.P.

  • Affiliations:
  • -;-;-

  • Venue:
  • ITC '10 Proceedings of the 2010 International Conference on Recent Trends in Information, Telecommunication and Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the building of part-of-speech Tagger for Malayalam Language using Support Vector Machine (SVM). POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information retrieval and information extraction. This supervised machine learning POS tagging approach requires a large amount of annotated training corpus to tag properly. At initial stage of POS-tagging for Malayalam, the model is trained with a very limited resource of annotated corpus. We tried to maximize the performance with this a substantial amount of annotated corpus. The objective of this project was to identify the ambiguities in Malayalam lexical items and develop an efficient and accurate POS Tagger. We have developed our own tagset for training and testing the POS-tagger generators. The present tagset consists of 29 tags. A corpus size of one hundred and eighty thousand words was used for training and testing the accuracy of the tagger generators. We found that the result obtained was more efficient and accurate compared with earlier methods for Malayalam POS tagging.