Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora

  • Authors:
  • Tri-Thanh Nguyen;Akira Shimazu

  • Affiliations:
  • -;-

  • Venue:
  • IEICE - Transactions on Information and Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.