Japanese Named Entity extraction with redundant morphological analysis

  • Authors:
  • Masayuki Asahara;Yuji Matsumoto

  • Affiliations:
  • Nara Institute of Science and Technology, Japan;Nara Institute of Science and Technology, Japan

  • Venue:
  • NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named Entity (NE) extraction is an important subtask of document processing such as information extraction and question answering. A typical method used for NE extraction of Japanese texts is a cascade of morphological analysis, POS tagging and chunking. However, there are some cases where segmentation granularity contradicts the results of morphological analysis and the building units of NEs, so that extraction of some NEs are inherently impossible in this setting. To cope with the unit problem, we propose a character-based chunking method. Firstly, the input sentence is analyzed redundantly by a statistical morphological analyzer to produce multiple (n-best) answers. Then, each character is annotated with its character types and its possible POS tags of the top n-best answers. Finally, a support vector machine-based chunker picks up some portions of the input sentence as NEs. This method introduces richer information to the chunker than previous methods that base on a single morphological analysis result. We apply our method to IREX NE extraction task. The cross validation result of the F-measure being 87.2 shows the superiority and effectiveness of the method.