Semi-supervised cause identification from aviation safety reports

Authors:
Isaac Persing;Vincent Ng
Affiliations:
University of Texas at Dallas, Richardson, TX;University of Texas at Dallas, Richardson, TX
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Year:
2009

Citing 7
Cited 2

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Taking advantage of the web for text classification with imbalanced classes

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence

Open-domain commonsense reasoning using discourse relations from a corpus of weblog stories

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Emotion cause detection with linguistic constructions

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce cause identification, a new problem involving classification of incident reports in the aviation domain. Specifically, given a set of pre-defined causes, a cause identification system seeks to identify all and only those causes that can explain why the aviation incident described in a given report occurred. The difficulty of cause identification stems in part from the fact that it is a multi-class, multilabel categorization task, and in part from the skewness of the class distributions and the scarcity of annotated reports. To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data. Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data.