Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles
ACM SIGKDD Explorations Newsletter
Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1)
ACM SIGKDD Explorations Newsletter
A machine learning approach for the curation of biomedical literature: KDD Cup 2002 (task 1)
ACM SIGKDD Explorations Newsletter
Automatic scientific text classification using local patterns: KDD CUP 2002 (task 1)
ACM SIGKDD Explorations Newsletter
Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Assessing the correlation between contextual patterns and biological entity tagging
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploring deep knowledge resources in biomedical name recognition
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Proposing a new term weighting scheme for text categorization
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Guest Editorial: Current issues in biomedical text mining and natural language processing
Journal of Biomedical Informatics
Journal of Biomedical Informatics
A novel method for prediction of protein interaction sites based on integrated RBF neural networks
Computers in Biology and Medicine
Hi-index | 0.00 |
Automatic detecting protein-protein interaction (PPI) relevant articles is a crucial step for large-scale biological database curation. The previous work adopted POS tagging, shallow parsing and sentence splitting techniques, but they achieved worse performance than the simple bag-of-words representation. In this paper, we generated and investigated multiple types of feature representations in order to further improve the performance of PPI text classification task. Besides the traditional domain-independent bag-of-words approach and the term weighting methods, we also explored other domain-dependent features, i.e. protein-protein interaction trigger keywords, protein named entities and the advanced ways of incorporating Natural Language Processing (NLP) output. The integration of these multiple features has been evaluated on the BioCreAtIvE II corpus. The experimental results showed that both the advanced way of using NLP output and the integration of bag-of-words and NLP output improved the performance of text classification. Specifically, in comparison with the best performance achieved in the BioCreAtIvE II IAS, the feature-level and classifier-level integration of multiple features improved the performance of classification 2.71% and 3.95%, respectively.