Analysis and robust extraction of changing named entities

  • Authors:
  • Masatoshi Tsuchiya;Shoko Endo;Seiichi Nakagawa

  • Affiliations:
  • Toyohashi University of Technology;Toyohashi University of Technology;Toyohashi University of Technology

  • Venue:
  • NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper focuses on the change of named entities over time and its influence on the performance of the named entity tagger. First, we analyze Japanese named entities which appear in Mainichi Newspaper articles published in 1995, 1996, 1997, 1998 and 2005. This analysis reveals that the number of named entity types and the number of named entity tokens are almost steady over time and that 70 ~ 80% of named entity types in a certain year occur in the articles published either in its succeeding year or in its preceding year. These facts lead that 20 ~ 30% of named entity types are replaced with new ones every year. The experiment against these texts shows that our proposing semi-supervised method which combines a small annotated corpus and a large unannotated corpus for training works robustly although the traditional supervised method is fragile against the change of name entity distribution.