Using syntactic dependencies to solve coreferences

Authors:
Marcus Stamborg;Dennis Medved;Peter Exner;Pierre Nugues
Affiliations:
Lund University Lund, Sweden;Lund University Lund, Sweden;Lund University Lund, Sweden;Lund University Lund, Sweden
Venue:
CoNLL '12 Joint Conference on EMNLP and CoNLL - Shared Task
Year:
2012

Citing 7
Cited 0

A machine learning approach to coreference resolution of noun phrases

Computational Linguistics - Special issue on computational anaphora resolution
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
CATiB: the Columbia Arabic Treebank

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
CoNLL-2011 shared task: modeling unrestricted coreference in OntoNotes

CONLL Shared Task '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
Exploring lexicalized features for coreference resolution

CONLL Shared Task '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
CoNLL-2012 shared task: Modeling Multilingual Unrestricted Coreference in OntoNotes

CoNLL '12 Joint Conference on EMNLP and CoNLL - Shared Task

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the structure of the LTH coreference solver used in the closed track of the CoNLL 2012 shared task (Pradhan et al., 2012). The solver core is a mention classifier that uses Soon et al. (2001)'s algorithm and features extracted from the dependency graphs of the sentences. This system builds on Björkelund and Nugues (2011)'s solver that we extended so that it can be applied to the three languages of the task: English, Chinese, and Arabic. We designed a new mention detection module that removes pleonastic pronouns, prunes constituents, and recovers mentions when they do not match exactly a noun phrase. We carefully redesigned the features so that they reflect more complex linguistic phenomena as well as discourse properties. Finally, we introduced a minimal cluster model grounded in the first mention of an entity. We optimized the feature sets for the three languages: We carried out an extensive evaluation of pairs of features and we complemented the single features with associations that improved the CoNLL score. We obtained the respective scores of 59.57, 56.62, and 48.25 on English, Chinese, and Arabic on the development set, 59.36, 56.85, and 49.43 on the test set, and the combined official score of 55.21.