Making large-scale support vector machine learning practical
Advances in kernel methods
Discovering information flow suing high dimensional conceptual space
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Probabilistic hyperspace analogue to language
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Dependency-Based Construction of Semantic Space Models
Computational Linguistics
Wikipedia-Based Kernels for Text Categorization
SYNASC '07 Proceedings of the Ninth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
A graph kernel for protein-protein interaction extraction
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Supervised classification for extracting biomedical events
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Corpus design for biomedical natural language processing
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Comparative experiments on learning information extractors for proteins and their interactions
Artificial Intelligence in Medicine
Hi-index | 0.00 |
Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process.