Japanese Named Entity extraction with redundant morphological analysis

Authors:
Masayuki Asahara;Yuji Matsumoto
Affiliations:
Nara Institute of Science and Technology, Japan;Nara Institute of Science and Technology, Japan
Venue:
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Year:
2003

Citing 4
Cited 22

The nature of statistical learning theory

The nature of statistical learning theory
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Combining outputs of multiple Japanese named entity chunkers by stacking

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

Extraction and classification of facemarks

Proceedings of the 10th international conference on Intelligent user interfaces
Japanese unknown word identification by character-based chunking

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Combining data-driven systems for improving Named Entity Recognition

Data & Knowledge Engineering
Robust extraction of named entity including unfamiliar word

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Bottom-up named entity recognition using a two-stage machine learning method

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Analysis and robust extraction of changing named entities

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
An assistant tool for concealing personal information in text

Proceedings of the 2007 conference on Human interface: Part II
Effectiveness of methods for syntactic and semantic recognition of numeral strings: tradeoffs between number of features and length of word N-grams

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Corpus annotation/management tools for the project: balanced corpus of contemporary written Japanese

LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Investigator name recognition from medical journal articles: a comparative study of SVM and structural SVM

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Semantic classification of automatically acquired nouns using lexico-syntactic clues

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A method of extracting malicious expressions in bulletin board systems by using context analysis

Information Processing and Management: an International Journal
Training a named entity recognizer on the web

WISE'11 Proceedings of the 12th international conference on Web information system engineering
A resource-based method for named entity extraction and classification

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Named entity recognition using a modified Pegasos algorithm

Proceedings of the 20th ACM international conference on Information and knowledge management
Syntactic and semantic disambiguation of numeral strings using an n-gram method

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Heuristic and rule-based knowledge acquisition: classification of numeral strings in text

PKAW'06 Proceedings of the 9th Pacific Rim Knowledge Acquisition international conference on Advances in Knowledge Acquisition and Management
Comparison of numeral strings interpretation: rule-based and feature-based n-gram methods

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
A named entity extraction using word information repeatedly collected from unlabeled data

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Improving the performance of a named entity recognition system with knowledge acquisition

EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Various approaches to text representation for named entity disambiguation

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
A Named Entity Recognition Method Based on Decomposition and Concatenation of Word Chunks

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity (NE) extraction is an important subtask of document processing such as information extraction and question answering. A typical method used for NE extraction of Japanese texts is a cascade of morphological analysis, POS tagging and chunking. However, there are some cases where segmentation granularity contradicts the results of morphological analysis and the building units of NEs, so that extraction of some NEs are inherently impossible in this setting. To cope with the unit problem, we propose a character-based chunking method. Firstly, the input sentence is analyzed redundantly by a statistical morphological analyzer to produce multiple (n-best) answers. Then, each character is annotated with its character types and its possible POS tags of the top n-best answers. Finally, a support vector machine-based chunker picks up some portions of the input sentence as NEs. This method introduces richer information to the chunker than previous methods that base on a single morphological analysis result. We apply our method to IREX NE extraction task. The cross validation result of the F-measure being 87.2 shows the superiority and effectiveness of the method.