Named entity recognition using acyclic weighted digraphs: a semi-supervised statistical method

  • Authors:
  • Kono Kim;Yeohoon Yoon;Harksoo Kim;Jungyun Seo

  • Affiliations:
  • Natural Language Processing Laboratory, Department of Computer Science, Sogang University, Seoul, Korea;NHN Corporation, Seongname-City, Gyeonggi-do, Korea;College of Information Technology, Kangwon National University, Chuncheon-si, Gangwon-do, Korea;Department of Computer Science and Interdisciplinary Program of Integrated Biotechnology, Sogang University, Seoul, Korea

  • Venue:
  • PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a NE (Named Entity) recognition system using a semisupervised statistical method. In training time, the NE recognition system builds error-prone training data only using a conventional POS (Part-Of-Speech) tagger and a NE dictionary that semi-automatically is constructed. Then, the NE recognition system generates a co-occurrence similarity matrix from the error-prone training corpus. In running time, the NE recognition system constructs AWDs (Acyclic Weighted Digraphs) based on the co-occurrence similarity matrix. Then, the NE recognition system detects NE candidates and assigns categories to the NE candidates using Viterbi searching on the AWDs. In the preliminary experiments on PLO (Person, Location and Organization) recognition, the proposed system showed 81.32% on average F1-measure.