Combining proper name-coreference with conditional random fields for semi-supervised named entity recognition in Vietnamese text

  • Authors:
  • Rathany Chan Sam;Huong Thanh Le;Thuy Thanh Nguyen;Thien Huu Nguyen

  • Affiliations:
  • Hanoi University of Science and Technology, Hanoi, Vietnam;Hanoi University of Science and Technology, Hanoi, Vietnam;Hanoi University of Science and Technology, Hanoi, Vietnam;Hanoi University of Science and Technology, Hanoi, Vietnam

  • Venue:
  • PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations and locations.Most existingNERsystems are based on supervised learning. This method often requires a large amount of labelled training data, which is very time-consuming to build. To solve this problem, we introduce a semi-supervised learning method for recognizing named entities in Vietnamese text by combining proper name coreference, named-ambiguityheuristicswithapowerful sequential learningmodel,Conditional RandomFields. Our approach inherits the idea of Liao and Veeramachaneni [6] and expands it by using proper name coreference. Starting by training the model using a small data set that is annotated manually, the learning model extracts high confident named entities and finds low confident ones by using proper name coreference rules. The low confident named entities are put in the training set to learn new context features. The F-scores of the systemfor extracting "Person", "Location" and "Organization" entities are 83.36%, 69.53% and 65.71%when applying heuristics proposed by Liao andVeeramachaneni.Those valueswhen using our proposed heuristics are 93.13%, 88.15% and 79.35%, respectively. It shows that our method is good in increasing the system accuracy.