Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Message Understanding Conference-6: a brief history
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Named entity chunking techniques in supervised learning for Japanese named entity recognition
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Japanese named entity extraction evaluation: analysis of results
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Efficient support vector classifiers for named entity recognition
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Japanese named entity recognition based on a simple rule generator and decision tree learning
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Japanese Named Entity extraction with redundant morphological analysis
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A high-performance semi-supervised learning method for text chunking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Robust extraction of named entity including unfamiliar word
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Hi-index | 0.00 |
This paper focuses on the change of named entities over time and its influence on the performance of the named entity tagger. First, we analyze Japanese named entities which appear in Mainichi Newspaper articles published in 1995, 1996, 1997, 1998 and 2005. This analysis reveals that the number of named entity types and the number of named entity tokens are almost steady over time and that 70 ~ 80% of named entity types in a certain year occur in the articles published either in its succeeding year or in its preceding year. These facts lead that 20 ~ 30% of named entity types are replaced with new ones every year. The experiment against these texts shows that our proposing semi-supervised method which combines a small annotated corpus and a large unannotated corpus for training works robustly although the traditional supervised method is fragile against the change of name entity distribution.