Training the Hidden Vector State Model from Un-annotated Corpus

Authors:
Deyu Zhou;Yulan He;Chee Keong Kwoh
Affiliations:
School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore;School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore;School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore
Venue:
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Year:
2007

Citing 8
Cited 0

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Semi-Supervised Self-Training of Object Detection Models

WACV-MOTION '05 Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) - Volume 1 - Volume 01
Extracting human protein interactions from MEDLINE using a full-sentence parser

Bioinformatics
Discovering patterns to extract protein--protein interactions from full texts

Bioinformatics
Extracting protein-protein interactions from the literature using the hidden vector state model

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since most knowledge about protein-protein interactions still hides in biological publications, there is an increasing focus on automatically extracting information from the vast amount of biological literature. Existing approaches can be broadly categorized as rule-based or statistically-based. Rule-based approaches require heavy manual effort. On the other hand, statistically-based approaches require large-scale, richly annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. We have proposed a hidden vector state (HVS) model for protein-protein interactions extraction. The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. State transitions are factored into a stack shift operation similar to those of a push-down automaton followed by the push of a new preterminal category label. In this paper, we propose a novel approach based on the k-nearest-neighbors classifier to automatically train the HVS model from un-annotated data. Experimental results show the improved performance over the baseline system with the HVS model trained from a small amount of the annotated data.