Named entity recognition for tweets

Authors:
Xiaohua Liu;Furu Wei;Shaodian Zhang;Ming Zhou
Affiliations:
Harbin Institute of Technology, Harbin, China;Microsoft Research Asia, Beijing, China;Shanghai Jiao Tong University, Shanghai, China;Microsoft Research Asia, Beijing, China
Venue:
ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
Year:
2013

Citing 29
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A simple rule-based part of speech tagger

HLT '91 Proceedings of the workshop on Speech and Natural Language
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Information extraction from voicemail transcripts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A robust risk minimization based named entity recognition system

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An effective two-stage model for exploiting non-local dependencies in named entity recognition

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Extracting personal names from email: applying named entity recognition to informal text

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Reranking for biomedical named-entity recognition

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Domain adaptation with latent semantic association for named entity recognition

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Locating complex named entities in web text

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
An unsupervised model for text message normalization

CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Annotating and recognising named entities in clinical notes

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Nested named entity recognition

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Domain adaptive bootstrapping for named entity recognition

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Minimally-supervised extraction of entities from text advertisements

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Annotating named entities in Twitter data with crowdsourcing

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Domain adaptation of rule-based annotators for named-entity recognition tasks

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Lexical normalisation of short text messages: makn sens a #twitter

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two main challenges of Named Entity Recognition (NER) for tweets are the insufficient information in a tweet and the lack of training data. We propose a novel method consisting of three core elements: (1) normalization of tweets; (2) combination of a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model; and (3) semisupervised learning framework. The tweet normalization preprocessing corrects common ill-formed words using a global linear model. The KNN-based classifier conducts prelabeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semisupervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of normalization, KNN, and semisupervised learning.