Semi-supervised sequence classification using abstraction augmented Markov models

Authors:
Cornelia Caragea;Adrian Silvescu;Doina Caragea;Vasant Honavar
Affiliations:
Iowa State University;Iowa State University;Kansas State University;Iowa State University
Venue:
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Year:
2010

Citing 14
Cited 0

The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Bioinformatics: the machine learning approach

Bioinformatics: the machine learning approach
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Kernel conditional random fields: representation and clique selection

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data

Knowledge and Information Systems
Word sense disambiguation using label propagation based semi-supervised learning

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Graph construction and b-matching for semi-supervised learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Seeing stars when there aren't many stars: graph-based semi-supervised learning for sentiment categorization

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Semi-Supervised Sequence Labeling with Self-Learned Features

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised methods for learning sequence classifiers rely on the availability of large amounts of labeled data. However, in many applications because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in semi-supervised methods that can exploit large amounts of unlabeled data together with small amounts of labeled data. In this paper, we introduce a novel Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised learning. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) the Markov models (MMs) (which do not take advantage of unlabeled data); and (ii) an expectation maximization (EM) based approach to semi-supervised training of MMs (that make use of unlabeled data). The results of our experiments on three protein subcellular localization prediction tasks show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; and (ii) are more accurate than both the MMs and the EM based semi-supervised MMs.