Estimating lexical priors for low-frequency morphologically ambiguous forms
Computational Linguistics
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Maximum entropy models for natural language ambiguity resolution
Maximum entropy models for natural language ambiguity resolution
Tagging English text with a probabilistic model
Computational Linguistics
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Part-of-speech tagging with neural networks
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Clustering Syntactic Positions with Similar Semantic Requirements
Computational Linguistics
PGR: portuguese attorney general's office decisions on the web
INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support
Selection restrictions acquisition for parsing improvement
INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support
Detection of strange and wrong automatic part-of-speech tagging
EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Towards encoding background knowledge with temporal extent into neural networks
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Improving arabic part-of-speech tagging through morphological analysis
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Determining the polarity of words through a common online dictionary
EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
A bootstrapping algorithm for learning the polarity of words
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Hi-index | 0.00 |
The analysis of textual data may start by classifying words using apredefined tag set. However, it is still a problem for natural language text understanding the assignment of part-of-speech tags to words in unrestricted text (called POS-tagging). Most part of current taggers require huge amounts of hand tagged text for training (in the order of 105 pretagged words): it requires linguistically highly trained man power for a highly repetitive and boring job, and the results obtained have no optimal quality. Moreover, when one wants to change to another text genre the same kind of problem must be faced again. Our proposal goes in another direction. By carefully combininga large lexicon with an efficient neural network based generator of taggers we can generate POS-taggers usingno more than 104 hand corrected tagged words for training. This training tagged text size can be feasibly hand corrected. Experimental results are presented and discussed for the SUSANNE Corpus. Results in three additional different Portuguese corpora are also discussed. 96% precision rates are obtained when unknown words occur in the test set. 98% precision rates are obtained when every word in the test set is known.