Improving the identification of non-anaphoric it using support vector machines

Authors:
José Carlos Clemente Litrán;Kenji Satou;Kentaro Torisawa
Affiliations:
Japan Advanced Institute of Science and Technology (JAIST), Tatsunokuchi, Ishikawa, Japan;Japan Advanced Institute of Science and Technology (JAIST), Tatsunokuchi, Ishikawa, Japan;Japan Advanced Institute of Science and Technology (JAIST), Tatsunokuchi, Ishikawa, Japan
Venue:
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Year:
2004

Citing 7
Cited 3

Instance-Based Learning Algorithms

Machine Learning
An algorithm for pronominal anaphora resolution

Computational Linguistics
Support-Vector Networks

Machine Learning
Applied morphological processing of English

Natural Language Engineering
Comparing a linguistic and a stochastic tagger

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Robust pronoun resolution with limited knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
The GENIA corpus: an annotated research abstract corpus in molecular biology domain

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Bayesian network, a model for NLP?

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Identification of pleonastic it using the web

Journal of Artificial Intelligence Research
Automatic Detection of Arabic Non-Anaphoric Pronouns for Improving Anaphora Resolution

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identification of non-anaphoric use of the pronoun it is crucial to achieve full anaphora resolution. Nevertheless, this problem has been either ignored or considered too simple to deserve a deeper study. In this paper we present a machine-learning approach using Support Vector Machines. We collected several instances of both anaphoric and non-anaphoric it from the GENIA corpus, together with syntactic information about the context. We show how by using a limited amount of knowledge our approach can achieve better accuracy than previous methods. We also analyze the relevance of features used to predict non-anaphoric uses.