Foundations of statistical natural language processing
Foundations of statistical natural language processing
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Automatic grammar induction and parsing free text: a transformation-based approach
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The unsupervised learning of natural language structure
The unsupervised learning of natural language structure
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
ICML '06 Proceedings of the 23rd international conference on Machine learning
Evaluating WordNet-based Measures of Lexical Semantic Relatedness
Computational Linguistics
Bidirectional inference with the easiest-first strategy for tagging sequence data
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning
PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
WordNet: similarity - measuring the relatedness of concepts
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Comparing two markov methods for part-of-speech tagging of portuguese
IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
Variable-Length Markov Chains (VLMCs) offer a way of modeling contexts longer than trigrams without suffering from data sparsity and state space complexity. However, in Historical Portuguese, two words show a high degree of ambiguity: que and a. The number of errors tagging these words corresponds to a quarter of the total errors made by a VLMC-based tagger. Moreover, these words seem to show two different types of ambiguity: one depending on non-local context and another on right context. We searched ways of expanding the VLMC-based tagger with a number of different models and methods in order to tackle these issues. The methods showed variable degrees of success, with one particular method solving much of the ambiguity of a. We explore reasons why this happened, and how everything we tried fails to improve the precision of que.