A hybrid approach of pattern extraction and semi-supervised learning for vietnamese named entity recognition

  • Authors:
  • Duc-Thuan Vo;Cheol-Young Ock

  • Affiliations:
  • Natural Language Processing Lab, School of Computer Engineering and Information Technology, University of Ulsan, Korea;Natural Language Processing Lab, School of Computer Engineering and Information Technology, University of Ulsan, Korea

  • Venue:
  • ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Requiring a large hand-annotated corpus in supervised learning of contemporary Vietnamese Named Entity Recognition researches is challenging. We therefore propose a hybrid approach of pattern extraction and semi-supervised learning. Applied rule-based method helps generating patterns automatically. Part-of-speech tagger, lexical diversity and chunking are explored to define rules in pattern extractions which are used for identifying potential named entities. Semi-supervised learning trains a small amount of seed named entities to categorize named entities in extracted patterns. In experiments, our approach shows good increasing the system accuracy with others in Vietnamese.