A machine learning approach to Portuguese clause identification

Authors:
Eraldo R. Fernandes;Cícero N. dos Santos;Ruy L. Milidiú
Affiliations:
Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro, PUC-Rio, Rio de Janeiro, Brazil;Mestrado em Informática Aplicada – MIA, Universidade de Fortaleza – UNIFOR, Fortaleza, Brazil;Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro, PUC-Rio, Rio de Janeiro, Brazil
Venue:
PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
Year:
2010

Citing 10
Cited 2

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Learning and Inference for Clause Identification

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Filtering-Ranking Perceptron Learning for Partial Parsing

Machine Learning
Text chunking by system combination

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Boosting trees for clause splitting

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Floresta Sintá(c)tica: Bigger, Thicker and Easier

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Introduction to the CoNLL-2001 shared task: clause identification

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
A Token Classification Approach to Dependency Parsing

STIL '09 Proceedings of the 2009 Seventh Brazilian Symposium in Information and Human Language Technology
Clause Identification Using Entropy Guided Transformation Learning

STIL '09 Proceedings of the 2009 Seventh Brazilian Symposium in Information and Human Language Technology

Hedge detection using the RelHunter approach

CoNLL '10: Shared Task Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
Entropy-Guided feature generation for structured learning of portuguese dependency parsing

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we apply and evaluate a machine-learning-based system to Portuguese clause identification. To the best of our knowledge, this is the first machine-learning-based approach to this task. The proposed system is based on Entropy Guided Transformation Learning. In order to train and evaluate the proposed system, we derive a clause annotated corpus from the Bosque corpus of the Floresta Sintá(c)tica Project – an European and Brazilian Portuguese treebank. We include part-of-speech (POS) tags to the derived corpus by using an automatic state-of-the-art tagger. Additionally, we use a simple heuristic to derive a phrase-chunk-like (PCL) feature from phrases in the Bosque corpus. We train an extractor to this sub-task and use it to automatically include the PCL feature in the derived clause corpus. We use POS and PCL tags as input features in the proposed clause identifier. This system achieves a Fβ=1 of 73.90, when using the golden values of the PCL feature. When the automatic values are used, the system obtains Fβ=1=69.31. These are promising results for a first machine learning approach to Portuguese clause identification. Moreover, these results are achieved using a very simple PCL feature, which is generated by a PCL extractor developed with very little modeling effort.