Multilayer sequence labeling

Authors:
Ai Azuma;Yuji Matsumoto
Affiliations:
Nara Institute of Science and Technology, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 9
Cited 0

A simple automatic derivative evaluation program

Communications of the ACM
Automatic differentiation of algorithms: from simulation to optimization

Automatic differentiation of algorithms: from simulation to optimization
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

The Journal of Machine Learning Research
Solving the problem of cascading errors: approximate Bayesian inference for linguistic annotation pipelines

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Learning with probabilistic features for improved pipeline models

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
First- and second-order expectation semirings with applications to minimum-risk training on translation forests

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a novel approach to cascaded learning and inference on sequences. We propose a weakly joint learning model on cascaded inference on sequences, called multilayer sequence labeling. In this model, inference on sequences is modeled as cascaded decision. However, the decision on a sequence labeling sequel to other decisions utilizes the features on the preceding results as marginalized by the probabilistic models on them. It is not novel itself, but our idea central to this paper is that the probabilistic models on succeeding labeling are viewed as indirectly depending on the probabilistic models on preceding analyses. We also propose two types of efficient dynamic programming which are required in the gradient-based optimization of an objective function. One of the dynamic programming algorithms resembles back propagation algorithm for multilayer feed-forward neural networks. The other is a generalized version of the forward-backward algorithm. We also report experiments of cascaded part-of-speech tagging and chunking of English sentences and show effectiveness of the proposed method.