AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

Authors:
Marta Recasens;M. Antònia Martí
Affiliations:
Centre de Llenguatge i Computació (CLiC), University of Barcelona, Barcelona, Spain 08007;Centre de Llenguatge i Computació (CLiC), University of Barcelona, Barcelona, Spain 08007
Venue:
Language Resources and Evaluation
Year:
2010

Citing 21
Cited 12

An algorithm for pronominal anaphora resolution

Computational Linguistics
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
On coreferring: coreference in MUC and related annotation schemes

Computational Linguistics
A machine learning approach to coreference resolution of noun phrases

Computational Linguistics - Special issue on computational anaphora resolution
A corpus-based investigation of definite description use

Computational Linguistics
95% Replicability for manual word sense tagging

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Robust pronoun resolution with limited knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Discourse deixis: reference to discourse segments

ACL '88 Proceedings of the 26th annual meeting on Association for Computational Linguistics
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A mention-synchronous coreference resolution algorithm based on the Bell tree

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Mark-up Barking Up the Wrong Tree

Computational Linguistics
Two uses of anaphora resolution in summarization

Information Processing and Management: an International Journal
Unrestricted Coreference: Identifying Entities and Events in OntoNotes

ICSC '07 Proceedings of the International Conference on Semantic Computing
Inter-coder agreement for computational linguistics

Computational Linguistics
CogNIAC: high precision coreference with limited knowledge and linguistic resources

ANARESOLUTION '97 Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts
Using coreference for question answering

CorefApp '99 Proceedings of the Workshop on Coreference and its Applications
Discourse annotation and semantic annotation in the GNOME corpus

DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
The Potsdam commentary corpus

DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
SemEval-2010 task 1: coreference resolution in multiple languages

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Using decision trees for conference resolution

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

A Deeper Look into Features for Coreference Resolution

DAARC '09 Proceedings of the 7th Discourse Anaphora and Anaphor Resolution Colloquium on Anaphora Processing and Applications
Supervised noun phrase coreference research: the first fifteen years

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Coreference resolution across corpora: languages, coding schemes, and preprocessing information

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SemEval-2010 task 1: Coreference resolution in multiple languages

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Blanc: Implementing the rand index for coreference evaluation

Natural Language Engineering
Can projected chains in parallel corpora help coreference resolution?

DAARC'11 Proceedings of the 8th international conference on Anaphora Processing and Applications
Annotating abstract anaphora

Language Resources and Evaluation
Elliphant: improved automatic detection of zero subjects and impersonal constructions in Spanish

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Improving machine translation of null subjects in Italian and Spanish

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Annotating the argument structure of deverbal nominalizations in Spanish

Language Resources and Evaluation
Coreference annotation schema for an inflectional language

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Coreference resolution: an empirical study based on SemEval-2010 shared Task 1

Language Resources and Evaluation

Quantified Score

Hi-index	0.01

Visualization

Abstract

This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k each) with coreference links between pronouns (including elliptical subjects and clitics), full noun phrases (including proper nouns), and discourse segments. The coding scheme distinguishes between identity links, predicative relations, and discourse deixis. Inter-annotator agreement on the link types is 85---89% above chance, and we provide an analysis of the sources of disagreement. The resulting corpora make it possible to train and test learning-based algorithms for automatic coreference resolution, as well as to carry out bottom-up linguistic descriptions of coreference relations as they occur in real data.