Extraction of protein interaction data: a comparative analysis of methods in use
EURASIP Journal on Bioinformatics and Systems Biology
@Note: A workbench for Biomedical Text Mining
Journal of Biomedical Informatics
Reconstruction of protein-protein interaction pathways by mining subject-verb-objects intermediates
PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
Event extraction for post-translational modifications
BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Towards automatic thematic sheets based on discursive categories in biomedical literature
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Towards exhaustive protein modification event extraction
BioNLP '11 Proceedings of BioNLP 2011 Workshop
MinePhos: A Literature Mining System for Protein Phoshphorylation Information Extraction
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Text Mining of Protein Phosphorylation Information Using a Generalizable Rule-Based Approach
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Hi-index | 3.84 |
Motivation: A large volume of experimental data on protein phosphorylation is buried in the fast-growing PubMed literature. While of great value, such information is limited in databases owing to the laborious process of literature-based curation. Computational literature mining holds promise to facilitate database curation. Results: A rule-based system, RLIMS-P (Rule-based LIterature Mining System for Protein Phosphorylation), was used to extract protein phosphorylation information from MEDLINE abstracts. An annotation-tagged literature corpus developed at PIR was used to evaluate the system for finding phosphorylation papers and extracting phosphorylation objects (kinases, substrates and sites) from abstracts. RLIMS-P achieved a precision and recall of 91.4 and 96.4% for paper retrieval, and of 97.9 and 88.0% for extraction of substrates and sites. Coupling the high recall for paper retrieval and high precision for information extraction, RLIMS-P facilitates literature mining and database annotation of protein phosphorylation. Availability: The program is available on request from the authors. The phosphorylation patterns and datasets used in this study are available at http://pir.georgetown.edu/iprolink/ Contact: zh9@georgetown.edu