Named entity recognition with character-level models

Authors:
Dan Klein;Joseph Smarr;Huy Nguyen;Christopher D. Manning
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Year:
2003

Citing 6
Cited 41

The Hierarchical Hidden Markov Model: Analysis and Applications

Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Automatic rule induction for unknown-word guessing

Computational Linguistics
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Disambiguation of proper names in text

ANLC '97 Proceedings of the fifth conference on Applied natural language processing

Chinese named entity recognition using lexicalized HMMs

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Predicting accuracy of extracting information from unstructured text collections

Proceedings of the 14th ACM international conference on Information and knowledge management
Information Extraction: Distilling Structured Data from Unstructured Text

Queue - Social Computing
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition for Hungarian using various machine learning algorithms

Acta Cybernetica
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Exploiting domain structure for named entity recognition

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Improving discriminative sequential learning by discovering important association of statistics

ACM Transactions on Asian Language Information Processing (TALIP)
Towards a context model driven german geo-tagging system

Proceedings of the 4th ACM workshop on Geographical information retrieval
Hungarian named entity recognition with a maximum entropy approach

Acta Cybernetica
Web-scale named entity recognition

Proceedings of the 17th ACM conference on Information and knowledge management
ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Applying Machine Learning to Chinese Entity Detection and Tracking

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Named entity recognition for Ukrainian: a resource-light approach

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Exploiting context for biomedical entity recognition: from syntax to the web

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Bootstrapping named entity recognition with automatically generated gazetteer lists

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Empirical study on the performance stability of named entity recognition model across domains

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Regular expression learning for information extraction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Multimodal subjectivity analysis of multiparty conversation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Scaling high-order character language models to gigabytes

Software '05 Proceedings of the Workshop on Software
An unsupervised system for identifying English inclusions in German text

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Phrase clustering for discriminative learning

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Investigating the effects of selective sampling on the annotation task

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
CRF-based active learning for Chinese named entity recognition

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Semi-joint labeling for chinese named entity recognition

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Recognizing biomedical named entities in Chinese research abstracts

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus

NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Automatic extraction of destinations, origins and route parts from human generated route directions

GIScience'10 Proceedings of the 6th international conference on Geographic information science
EagleEye: entity-centric business intelligence for smarter decisions

IBM Journal of Research and Development
Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach

ACM Transactions on Asian Language Information Processing (TALIP)
Exploiting morphology in Turkish named entity recognition system

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Training a named entity recognizer on the web

WISE'11 Proceedings of the 12th international conference on Web information system engineering
Chinese named entity recognition based on multilevel linguistic features

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Resolution of data sparseness in named entity recognition using hierarchical features and feature relaxation principle

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Bootstrapped named entity recognition for product attribute extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Common sense reasoning – from cyc to intelligent assistant

Ambient Intelligence in Everyday Life
A method for identifying Japanese shop and company names by spatiotemporal cleaning of eccentrically located frequently appearing words

Advances in Artificial Intelligence
Automatic identification of protagonist in fairy tales using verb

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
VAHA: verbs associate with human activity --- a study on fairy tales

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
More for your money: exploiting performance heterogeneity in public clouds

Proceedings of the Third ACM Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss two named-entity recognition models which use characters and character n-grams either exclusively or as an important part of their data representation. The first model is a character-level HMM with minimal context information, and the second model is a maximum-entropy conditional markov model with substantially richer context features. Our best model achieves an overall F1 of 86.07% on the English test data (92.31% on the development data). This number represents a 25% error reduction over the same model without word-internal (substring) features.