A Named Entity Recognition Method Based on Decomposition and Concatenation of Word Chunks

Authors:
Tomoya Iwakura;Hiroya Takamura;Manabu Okumura
Affiliations:
Fujitsu Laboratories Ltd.;Tokyo Institute of Technology;Tokyo Institute of Technology
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2013

Citing 28
Cited 0

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
The perception: a probabilistic model for information storage and organization in the brain

Neurocomputing: foundations of research
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Representing text chunks

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Japanese Named Entity extraction with redundant morphological analysis

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Named entity extraction based on a maximum entropy model and transformation rules

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Named Entity Extraction using AdaBoost

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Japanese dependency analysis using cascaded chunking

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improving the scalability of semi-Markov conditional random fields for named entity recognition

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Analysing Wikipedia and gold-standard corpora for NER training

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Distributed training strategies for the structured perceptron

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Dynamic programming for linear-time incremental parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Distributed asynchronous online learning for natural language processing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Confidence in structured-prediction using confidence-weighted models

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a named entity (NE) recognition method in which word chunks are repeatedly decomposed and concatenated. Our method identifies word chunks with a base chunker, such as a noun phrase chunker, and then recognizes NEs from the recognized word chunk sequences. By using word chunks, we can obtain features that cannot be obtained in word-sequence-based recognition methods, such as the first word of a word chunk, the last word of a word chunk, and so on. However, each word chunk may include a part of an NE or multiple NEs. To solve this problem, we use the following operators: SHIFT for separating the first word from a word chunk, POP for separating the last word from a word chunk, JOIN for concatenating two word chunks, and REDUCE for assigning an NE label to a word chunk. We evaluate our method on a Japanese NE recognition dataset that includes about 200,000 annotations of 191 types of NEs from over 8,500 news articles. The experimental results show that the training and processing speeds of our method are faster than those of a linear-chain structured perceptron and a semi-Markov perceptron, while maintaining high accuracy.