Learning and Inference for Clause Identification
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Filtering-Ranking Perceptron Learning for Partial Parsing
Machine Learning
Text chunking by system combination
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Boosting trees for clause splitting
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning
PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Floresta Sintá(c)tica: Bigger, Thicker and Easier
PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Introduction to the CoNLL-2001 shared task: clause identification
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
A Token Classification Approach to Dependency Parsing
STIL '09 Proceedings of the 2009 Seventh Brazilian Symposium in Information and Human Language Technology
Clause Identification Using Entropy Guided Transformation Learning
STIL '09 Proceedings of the 2009 Seventh Brazilian Symposium in Information and Human Language Technology
Hedge detection using the RelHunter approach
CoNLL '10: Shared Task Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
Entropy-Guided feature generation for structured learning of portuguese dependency parsing
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Hi-index | 0.00 |
In this work, we apply and evaluate a machine-learning-based system to Portuguese clause identification. To the best of our knowledge, this is the first machine-learning-based approach to this task. The proposed system is based on Entropy Guided Transformation Learning. In order to train and evaluate the proposed system, we derive a clause annotated corpus from the Bosque corpus of the Floresta Sintá(c)tica Project – an European and Brazilian Portuguese treebank. We include part-of-speech (POS) tags to the derived corpus by using an automatic state-of-the-art tagger. Additionally, we use a simple heuristic to derive a phrase-chunk-like (PCL) feature from phrases in the Bosque corpus. We train an extractor to this sub-task and use it to automatically include the PCL feature in the derived clause corpus. We use POS and PCL tags as input features in the proposed clause identifier. This system achieves a Fβ=1 of 73.90, when using the golden values of the PCL feature. When the automatic values are used, the system obtains Fβ=1=69.31. These are promising results for a first machine learning approach to Portuguese clause identification. Moreover, these results are achieved using a very simple PCL feature, which is generated by a PCL extractor developed with very little modeling effort.