A composite kernel for named entity recognition

Authors:
Sujan Kumar Saha;Shashi Narayan;Sudeshna Sarkar;Pabitra Mitra
Affiliations:
Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, Kharagpur 721302, India;Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, Kharagpur 721302, India;Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, Kharagpur 721302, India;Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
Venue:
Pattern Recognition Letters
Year:
2010

Citing 29
Cited 1

Class-based n-gram models of natural language

Computational Linguistics
The nature of statistical learning theory

The nature of statistical learning theory
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Text classification using string kernels

The Journal of Machine Learning Research
Rapid development of Hindi named entity recognition using conditional random fields and feature induction

ACM Transactions on Asian Language Information Processing (TALIP)
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Extracting the names of genes and gene products with a hidden Markov model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Hierarchical clustering of words

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Fast String Kernels using Inexact Matching for Protein Sequences

The Journal of Machine Learning Research
Biomedical named entity recognition using two-phase model based on SVMs

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Mismatch string kernels for discriminative protein classification

Bioinformatics
Use of support vector learning for chunk identification

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Tuning support vector machines for biomedical named entity recognition

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Use of support vector machines in extended named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Fast and space efficient string kernels using suffix arrays

ICML '06 Proceedings of the 23rd international conference on Machine learning
Length-weighted string kernels for sequence data classification

Pattern Recognition Letters
Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition

IEICE - Transactions on Information and Systems
Introduction to the bio-entity recognition task at JNLPBA

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploiting context for biomedical entity recognition: from syntax to the web

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploring deep knowledge resources in biomedical name recognition

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
POSBIOTM-NER in the shared task of BioNLP/NLPBA 2004

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Feature selection techniques for maximum entropy based biomedical named entity recognition

Journal of Biomedical Informatics
Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Integrating linguistic knowledge into a conditional random fieldframework to identify biomedical named entities

Expert Systems with Applications: An International Journal
Biomedical named entity recognition: a poor knowledge HMM-based approach

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Online named entity recognition method for microtexts in social networking services: A case study of twitter

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.10

Visualization

Abstract

In this paper, we propose a novel kernel function for support vector machines (SVM) that can be used for sequential labeling tasks like named entity recognition (NER). Machine learning methods like support vector machines, maximum entropy, hidden Markov model and conditional random fields are the most widely used methods for implementing NER systems. The features used in machine learning algorithms for NER are mostly string based features. The proposed kernel is based on calculating a novel distance function between the string based features. In tasks like NER, the similarity between the contexts as well as the semantic similarity between the words play an important role. The goal is to capture the context and semantic information in NER like tasks. The proposed distance function makes use of certain statistics primarily derived from the training data and hierarchical clustering information. The kernel function is applied to the Hindi and biomedical NER tasks and the results are quite promising.