A TENGRAM method based part-of-speech tagging of multi-category words in Hindi language

Authors:
J. P. Gupta;Devendra K. Tayal;Arti Gupta
Affiliations:
Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India;Department of Computer Engineering, GGSIP University, New Delhi, India;Department of Computer Science & IT, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 9
Cited 0

A comparison of the decision table and tree

Communications of the ACM
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Incomplete Information: Structure, Inference, Complexity

Incomplete Information: Structure, Inference, Complexity
Rough Sets: Mathematical Foundations

Rough Sets: Mathematical Foundations
Part-of-Speech Tagging with Evolutionary Algorithms

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Tagging accurately: don't guess if you know

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
An Approach to Improving the Quality of Part-of-Speech Tagging of Chinese Text

ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Text Mining Application Programming (Programming Series)

Text Mining Application Programming (Programming Series)
Generating production rules from decision trees

IJCAI'87 Proceedings of the 10th international joint conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	12.05

Visualization

Abstract

In this paper, we have dealt on the problem of part-of-speech tagging of multi-category words which appear within the sentences of Hindi language. Firstly, a Hindi tagger is proposed which provides part-of-speech tags developed using grammar of Hindi language. For this purpose, Hindi Devanagari alphabets are used and their Hindi transliteration is done within the proposed tagger. Thereafter, a Rules' based TENGRAM method is described with an illustrative example, which guides to disambiguate multi-category words within sentences of Hindi corpus. The rules generated in TENGRAM are the result of computation of discernibility matrices, discernibility functions and reducts. These computations have been generated from decision tables which are based on theory of Rough sets. Basically, a discernibility matrix helps in cutting down indiscernible condition attributes; a discernibility function has rows corresponding to each column in the discernibility matrix which develops reducts; and the reducts provide a minimal subset of attributes which preserve indiscernibility relation of decision tables and hence they generate the decision rules.