Neural networks letter: Training the max-margin sequence model with the relaxed slack variables

Authors:
Lingfeng Niu;Jianmin Wu;Yong Shi
Affiliations:
Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing, 100190, China;Yahoo! Research & Development (Beijing), Tsinghua Science Park, Bejing, 100084, China;Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing, 100190, China and College of Information Science and Technology, University of Nebraska at Omaha, Omaha, ...
Venue:
Neural Networks
Year:
2012

Citing 16
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Machine Learning for Information Extraction in Informal Domains

Machine Learning - Special issue on information retrieval
Modern Information Retrieval

Modern Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Kernel-Based Learning of Hierarchical Multilabel Classification Models

The Journal of Machine Learning Research
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
Accurate max-margin training for structured output spaces

Proceedings of the 25th international conference on Machine learning
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Introduction to Machine Learning

Introduction to Machine Learning
A parallel decomposition algorithm for training multiclass kernel-based vector machines

Optimization Methods & Software
SVM based learning system for information extraction

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequence models are widely used in many applications such as natural language processing, information extraction and optical character recognition, etc. We propose a new approach to train the max-margin based sequence model by relaxing the slack variables in this paper. With the canonical feature mapping definition, the relaxed problem is solved by training a multiclass Support Vector Machine (SVM). Compared with the state-of-the-art solutions for the sequence learning, the new method has the following advantages: firstly, the sequence training problem is transformed into a multiclassification problem, which is more widely studied and already has quite a few off-the-shelf training packages; secondly, this new approach reduces the complexity of training significantly and achieves comparable prediction performance compared with the existing sequence models; thirdly, when the size of training data is limited, by assigning different slack variables to different microlabel pairs, the new method can use the discriminative information more frugally and produces more reliable model; last but not least, by employing kernels in the intermediate multiclass SVM, nonlinear feature space can be easily explored. Experimental results on the task of named entity recognition, information extraction and handwritten letter recognition with the public datasets illustrate the efficiency and effectiveness of our method.