NADA: a robust system for non-referential pronoun detection

  • Authors:
  • Shane Bergsma;David Yarowsky

  • Affiliations:
  • Dept. of Computer Science and Human Language Technology Center of Excellence, Johns Hopkins University, US;Dept. of Computer Science and Human Language Technology Center of Excellence, Johns Hopkins University, US

  • Venue:
  • DAARC'11 Proceedings of the 8th international conference on Anaphora Processing and Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present $\textsc{Nada}$ : the Non-Anaphoric Detection Algorithm. $\textsc{Nada}$ is a novel, publicly-available program that accurately distinguishes between the referential and non-referential pronoun it in raw English text. Like recent state-of-the-art approaches, $\textsc{Nada}$ uses very large-scale web $\mbox{N-gram}$ features, but $\textsc{Nada}$ makes these features practical by compressing the $\mbox{N-gram}$ counts so they can fit into computer memory. $\textsc{Nada}$ therefore operates as a fast, stand-alone system. $\textsc{Nada}$ also improves over previous web-scale systems by considering the entire sentence, rather than narrow context windows, via long-distance lexical features. $\textsc{Nada}$ very substantially outperforms other state-of-the-art systems in non-referential detection accuracy.