CRF Models for Tamil Part of Speech Tagging and Chunking

Authors:
S. Lakshmana Pandian;T. V. Geetha
Affiliations:
Department of Computer Science and Engineering, Anna University, Chennai, India 25;Department of Computer Science and Engineering, Anna University, Chennai, India 25
Venue:
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Year:
2009

Citing 12
Cited 0

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Connectionist Model for Part of Speech Tagging

Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
Information Extraction with HMM Structures Learned by Stochastic Optimization

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Chunking with maximum entropy models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conditional random fields (CRFs) is a framework for building probabilistic models to segment and label sequence data. CRFs offer several advantages over hidden Markov models (HMMs) and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. CRFs also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. In this paper we propose the Language Models developed for Part Of Speech (POS) tagging and chunking using CRFs for Tamil. The Language models are designed based on morphological information. The CRF based POS tagger has an accuracy of about 89.18%, for Tamil and the chunking process performs at an accuracy of 84.25% for the same language.