Variable-length Markov models and ambiguous words in Portuguese

Authors:
Fabio Natanael Kepler;Marcelo Finger
Affiliations:
University of Sao Paulo, Sao Paulo, SP, Brazil;University of Sao Paulo, Sao Paulo, SP, Brazil
Venue:
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Year:
2010

Citing 12
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Automatic grammar induction and parsing free text: a transformation-based approach

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Bidirectional inference with the easiest-first strategy for tagging sequence data

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Comparing two markov methods for part-of-speech tagging of portuguese

IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Variable-Length Markov Chains (VLMCs) offer a way of modeling contexts longer than trigrams without suffering from data sparsity and state space complexity. However, in Historical Portuguese, two words show a high degree of ambiguity: que and a. The number of errors tagging these words corresponds to a quarter of the total errors made by a VLMC-based tagger. Moreover, these words seem to show two different types of ambiguity: one depending on non-local context and another on right context. We searched ways of expanding the VLMC-based tagger with a number of different models and methods in order to tackle these issues. The methods showed variable degrees of success, with one particular method solving much of the ambiguity of a. We explore reasons why this happened, and how everything we tried fails to improve the precision of que.