The nature of statistical learning theory
The nature of statistical learning theory
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Principles of data mining
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Subset Selection in Text-Learning
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Making Logistic Regression a Core Data Mining Tool with TR-IRLS
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Bioinformatics
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Bioinformatics
Learning to Find Relevant Biological Articles without Negative Training Examples
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Learning to classify texts using positive and unlabeled data
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Semi-supervised Learning of Text Classification on Bacterial Protein-Protein Interaction Documents
IJCBS '09 Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
With the advance of high-throughput genomics and proteomics technologies, it becomes critical to mine and curate protein-protein interaction (PPI) networks from biological research literature. Several PPI knowledge bases have been curated by domain experts but they are far from comprehensive. Observing that PPI-relevant documents can be obtained from PPI knowledge bases recording literature evidences and also that a large number of unlabeled documents (mostly negative) are freely available, we investigated learning from positive and unlabeled data (LPU) and developed an automated system for the retrieval of PPI-relevant articles aiming at assisting the curation of a bacterial PPI knowledge base, MPIDB. Two different approaches of obtaining unlabeled documents were used: one based on PubMed MeSH term search and the other based on an existing knowledge base, UniProtKB. We found unlabeled documents obtained from UniProtKB tend to yield better document classifiers for PPI curation purposes. Our study shows that LPU is a possible scenario for the development of an automated system to retrieve PPI-relevant articles, where there is no requirement for extra annotation effort. Selection of machine learning algorithms and that of unlabeled documents would be critical in constructing an effective LPU-based system.