Elliphant: improved automatic detection of zero subjects and impersonal constructions in Spanish

  • Authors:
  • Luz Rello;Ricardo Baeza-Yates;Ruslan Mitkov

  • Affiliations:
  • NLP and Web Research Groups Univ. Pompeu Fabra Barcelona, Spain;Yahoo! Research Barcelona, Spain;Research Group in Computational Linguistics Univ. of Wolverhampton, UK

  • Venue:
  • EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In pro-drop languages, the detection of explicit subjects, zero subjects and non-referential impersonal constructions is crucial for anaphora and co-reference resolution. While the identification of explicit and zero subjects has attracted the attention of researchers in the past, the automatic identification of impersonal constructions in Spanish has not been addressed yet and this work is the first such study. In this paper we present a corpus to underpin research on the automatic detection of these linguistic phenomena in Spanish and a novel machine learning-based methodology for their computational treatment. This study also provides an analysis of the features, discusses performance across two different genres and offers error analysis. The evaluation results show that our system performs better in detecting explicit subjects than alternative systems.