Uncertainty detection as approximate max-margin sequence labelling

Authors:
Oscar Täckström;Sumithra Velupillai;Martin Hassel;Gunnar Eriksson;Hercules Dalianis;Jussi Karlgren
Affiliations:
SICS/Uppsala University, Kista/Uppsala, Sweden;Stockholm University, Kista, Sweden;Stockholm University, Kista, Sweden;Kista, Sweden;Stockholm University, Kista, Sweden;Kista, Sweden
Venue:
CoNLL '10: Shared Task Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
Year:
2010

Citing 11
Cited 1

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A non-projective dependency parser

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
Exploring hedge identification in biomedical literature

Journal of Biomedical Informatics
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Feature hashing for large scale multitask learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning the scope of hedge cues in biomedical texts

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Finding hedges by chasing weasels: hedge detection using Wikipedia tags and shallow linguistic features

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Detecting speculations and their scopes in scientific text

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text

CoNLL '10: Shared Task Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task

Cross-genre and cross-domain detection of semantic uncertainty

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports experiments for the CoNLL-2010 shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the insentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the bioencoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our official results for Task 1 for the biological domain are 85.2 F1-score, for the Wikipedia set 55.4 F1-score. For Task 2, our official results are 2.1 for the entire task with a score of 62.5 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 86.0, Wikipedia: 58.2; Task 2, scopes: 39.6 and cues: 78.5.