Improving the identification of non-anaphoric it using support vector machines

  • Authors:
  • José Carlos Clemente Litrán;Kenji Satou;Kentaro Torisawa

  • Affiliations:
  • Japan Advanced Institute of Science and Technology (JAIST), Tatsunokuchi, Ishikawa, Japan;Japan Advanced Institute of Science and Technology (JAIST), Tatsunokuchi, Ishikawa, Japan;Japan Advanced Institute of Science and Technology (JAIST), Tatsunokuchi, Ishikawa, Japan

  • Venue:
  • JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identification of non-anaphoric use of the pronoun it is crucial to achieve full anaphora resolution. Nevertheless, this problem has been either ignored or considered too simple to deserve a deeper study. In this paper we present a machine-learning approach using Support Vector Machines. We collected several instances of both anaphoric and non-anaphoric it from the GENIA corpus, together with syntactic information about the context. We show how by using a limited amount of knowledge our approach can achieve better accuracy than previous methods. We also analyze the relevance of features used to predict non-anaphoric uses.