Automatic stochastic tagging of natural language texts

Authors:
Evangelos Dermatas;George Kokkinakis
Affiliations:
University of Patras;University of Patras
Venue:
Computational Linguistics
Year:
1995

Citing 10
Cited 15

Studies in part of speech labelling

HLT '91 Proceedings of the workshop on Speech and Natural Language
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Ambiguity resolution in a reductionistic parser

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Constraint grammar as a framework for parsing running text

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Neural network approach to word category prediction for English texts

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3

Enhancing Text Retrieval by Using Advanced Stylistic Techniques

Journal of Intelligent and Robotic Systems
Machine Learning in Human Language Technology

Machine Learning and Its Applications, Advanced Lectures
A Practical Chunker for Unrestricted Text

NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Dialogue act modeling for automatic tagging and recognition of conversational speech

Computational Linguistics
Automatic rule induction for unknown-word guessing

Computational Linguistics
Unsupervised learning of part-of-speech guessing rules

Natural Language Engineering
POS disambiguation and unknown word guessing with decision trees

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
A stochastic parser based on a structural word prediction model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Identification of probable real words: an entropy-based approach

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Part-of-speech tagging of modern hebrew text

Natural Language Engineering
Using target-language information to train part-of-speech taggers for machine translation

Machine Translation
Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Part of speech tagger for Assamese text

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Automatic evaluation of syntactic learners in typologically-different languages

Cognitive Systems Research
Adaptive Bayesian HMM for Fully Unsupervised Chinese Part-of-Speech Induction

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of grammatical tags; a small set containing the eleven main grammatical classes and a large set of grammatical categories common to all languages. The unknown words are tagged using an experimentally proven stochastic hypothesis that links the stochastic behavior of the unknown words with that of the less probable known words. A fully automatic training and tagging program has been implemented on an IBM PC-compatible 80386-based computer. Measurements of error rate, time response, and memory requirements have shown that the taggers' performance is satisfactory, even though a small training text is available. The error rate is improved when new texts are used to update the stochastic model parameters.