Named entity recognition for tweets

  • Authors:
  • Xiaohua Liu;Furu Wei;Shaodian Zhang;Ming Zhou

  • Affiliations:
  • Harbin Institute of Technology, Harbin, China;Microsoft Research Asia, Beijing, China;Shanghai Jiao Tong University, Shanghai, China;Microsoft Research Asia, Beijing, China

  • Venue:
  • ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two main challenges of Named Entity Recognition (NER) for tweets are the insufficient information in a tweet and the lack of training data. We propose a novel method consisting of three core elements: (1) normalization of tweets; (2) combination of a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model; and (3) semisupervised learning framework. The tweet normalization preprocessing corrects common ill-formed words using a global linear model. The KNN-based classifier conducts prelabeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semisupervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of normalization, KNN, and semisupervised learning.