Building a pronominalization model by feature selection and machine learning

  • Authors:
  • Ji-Eun Roh;Jong-Hyeok Lee

  • Affiliations:
  • Div. of Electrical and Computer Engineering POSTECH and Advanced Information Technology, Research Center (AITrc), Pohang, Republic of Korea;Div. of Electrical and Computer Engineering POSTECH and Advanced Information Technology, Research Center (AITrc), Pohang, Republic of Korea

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pronominalization is an important component in generating a coherent text. In this paper, we identify features that influence pronominalization, and construct a pronoun generation model by using various machine learning techniques. The old entities, which are the target of pronominalization, are categorized into three types according to their tendency in attentional state: Cb and old-Cp derived from a Centering model, and the remaining old entities. We construct a pronoun generation model for each type. Eighty-seven texts are gathered from three genres for training and testing. Using this, we verify that our proposed features are well defined to explain pronominalization in Korean, and we also show that our model significantly outperforms previous ones with 99% confidence level by t-test. We also identify central features that have a strong influence on pronominalization across genres.