The perception: a probabilistic model for information storage and organization in the brain
Neurocomputing: foundations of research
C4.5: programs for machine learning
C4.5: programs for machine learning
Independent component analysis, a new concept?
Signal Processing - Special issue on higher order statistics
Ultraconservative online algorithms for multiclass problems
The Journal of Machine Learning Research
Ranking algorithms for named-entity extraction: boosting and the voted perceptron
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An end-to-end discriminative approach to machine translation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semantic role labeling using dependency trees
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Floresta Sintá(c)tica: Bigger, Thicker and Easier
PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
CoNLL-X shared task on multilingual dependency parsing
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Multilingual dependency analysis with a two-stage discriminative parser
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
A fast decision tree learning algorithm
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A machine learning approach to Portuguese clause identification
PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
Latent structure perceptron with feature induction for unrestricted coreference resolution
CoNLL '12 Joint Conference on EMNLP and CoNLL - Shared Task
Hi-index | 0.00 |
Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.