Robust extraction of named entity including unfamiliar word

Authors:
Masatoshi Tsuchiya;Shinya Hida;Seiichi Nakagawa
Affiliations:
Toyohashi University of Technology;Toyohashi University of Technology;Toyohashi University of Technology
Venue:
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Year:
2008

Citing 8
Cited 1

An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Representing text chunks

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Named entity chunking techniques in supervised learning for Japanese named entity recognition

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Japanese named entity extraction evaluation: analysis of results

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Japanese named entity recognition based on a simple rule generator and decision tree learning

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Japanese Named Entity extraction with redundant morphological analysis

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1

Analysis and robust extraction of changing named entities

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel method to extract named entities including unfamiliar words which do not occur or occur few times in a training corpus using a large unannotated corpus. The proposed method consists of two steps. The first step is to assign the most similar and familiar word to each unfamiliar word based on their context vectors calculated from a large unannotated corpus. After that, traditional machine learning approaches are employed as the second step. The experiments of extracting Japanese named entities from IREX corpus and NHK corpus show the effectiveness of the proposed method.