A composite kernel for named entity recognition

  • Authors:
  • Sujan Kumar Saha;Shashi Narayan;Sudeshna Sarkar;Pabitra Mitra

  • Affiliations:
  • Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, Kharagpur 721302, India;Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, Kharagpur 721302, India;Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, Kharagpur 721302, India;Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, Kharagpur 721302, India

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.10

Visualization

Abstract

In this paper, we propose a novel kernel function for support vector machines (SVM) that can be used for sequential labeling tasks like named entity recognition (NER). Machine learning methods like support vector machines, maximum entropy, hidden Markov model and conditional random fields are the most widely used methods for implementing NER systems. The features used in machine learning algorithms for NER are mostly string based features. The proposed kernel is based on calculating a novel distance function between the string based features. In tasks like NER, the similarity between the contexts as well as the semantic similarity between the words play an important role. The goal is to capture the context and semantic information in NER like tasks. The proposed distance function makes use of certain statistics primarily derived from the training data and hierarchical clustering information. The kernel function is applied to the Hindi and biomedical NER tasks and the results are quite promising.