ETL ensembles for chunking, NER and SRL

Authors:
Cícero N. dos Santos;Ruy L. Milidiú;Carlos E. M. Crestana;Eraldo R. Fernandes
Affiliations:
Mestrado em Informática Aplicada – MIA, Universidade de Fortaleza – UNIFOR, Fortaleza, Brazil;Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro – PUC-Rio, Rio de Janeiro, Brazil;Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro – PUC-Rio, Rio de Janeiro, Brazil;Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro – PUC-Rio, Rio de Janeiro, Brazil
Venue:
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2010

Citing 18
Cited 2

Deterministic part-of-speech tagging with finite-state transducers

Computational Linguistics
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Bagging predictors

Machine Learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random Forests

Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Transformation based learning and data-driven lexical disambiguation: syntactic and semantic ambiguity resolution

Transformation based learning and data-driven lexical disambiguation: syntactic and semantic ambiguity resolution
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Named Entity Extraction using AdaBoost

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Boosting random subspace method

Neural Networks
Combination strategies for semantic role labeling

Journal of Artificial Intelligence Research
Introduction to the CoNLL-2005 shared task: semantic role labeling

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Combining bagging and random subspaces to create better ensembles

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
A Token Classification Approach to Dependency Parsing

STIL '09 Proceedings of the 2009 Seventh Brazilian Symposium in Information and Human Language Technology
Clause Identification Using Entropy Guided Transformation Learning

STIL '09 Proceedings of the 2009 Seventh Brazilian Symposium in Information and Human Language Technology
A general and multi-lingual phrase chunking model based on masking method

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Extracting person names from diverse and noisy OCR text

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Rule and tree ensembles for unrestricted coreference resolution

CONLL Shared Task '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new ensemble method that uses Entropy Guided Transformation Learning (ETL) as the base learner. The proposed approach, ETL Committee, combines the main ideas of Bagging and Random Subspaces. We also propose a strategy to include redundancy in transformation-based models. To evaluate the effectiveness of the ensemble method, we apply it to three Natural Language Processing tasks: Text Chunking, Named Entity Recognition and Semantic Role Labeling. Our experimental findings indicate that ETL Committee significantly outperforms single ETL models, achieving state-of-the-art competitive results. Some positive characteristics of the proposed ensemble strategy are worth to mention. First, it improves the ETL effectiveness without any additional human effort. Second, it is particularly useful when dealing with very complex tasks that use large feature sets. And finally, the resulting training and classification processes are very easy to parallelize.