Optimizing syntax patterns for discovering protein-protein interactions
Proceedings of the 2005 ACM symposium on Applied computing
Literature Extraction of Protein Functions Using Sentence Pattern Mining
IEEE Transactions on Knowledge and Data Engineering
Two-phase learning for biological event extraction and verification
ACM Transactions on Asian Language Information Processing (TALIP)
Knowledge discovery based on an implicit and explicit conceptual network
Journal of the American Society for Information Science and Technology
Artificial Intelligence in Medicine
Artificial Intelligence in Medicine
Artificial Intelligence in Medicine
Methodological Review: Extracting interactions between proteins from the literature
Journal of Biomedical Informatics
Extraction of protein interaction data: a comparative analysis of methods in use
EURASIP Journal on Bioinformatics and Systems Biology
Training the Hidden Vector State Model from Un-annotated Corpus
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Analysis of link grammar on biomedical dependency corpus targeted at protein-protein interactions
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Evolutionary hypernetwork classifiers for protein-proteininteraction sentence filtering
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Syntactic dependency based heuristics for biological event extraction
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Journal of Biomedical Informatics
Assigning roles to protein mentions: The case of transcription factors
Journal of Biomedical Informatics
BioProber2.0: a unified biomedical workbench with mining and probing literatures
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Reconstruction of protein-protein interaction pathways by mining subject-verb-objects intermediates
PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
An IR-Aided Machine Learning Framework for the BioCreative II.5 Challenge
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Simplicity is better: revisiting single kernel PPI extraction
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
ICCS'10 Proceedings of the 18th international conference on Conceptual structures: from information to intelligence
Extracting protein-protein interactions in biomedical literature using an existing syntactic parser
KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Towards an automated analysis of biomedical abstracts
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Improving text mining with controlled natural language: a case study for protein interactions
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
BioPubMiner: machine learning component-based biomedical information analysis platform
CIT'04 Proceedings of the 7th international conference on Intelligent Information Technology
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Relation mining over a corpus of scientific literature
AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
A multi-strategy approach to biological named entity recognition
Expert Systems with Applications: An International Journal
A system for the extraction and representation of summary of product characteristics content
Artificial Intelligence in Medicine
Hi-index | 3.84 |
Motivation: The living cell is a complex machine that depends on the proper functioning of its numerous parts, including proteins. Understanding protein functions and how they modify and regulate each other is the next great challenge for life-sciences researchers. The collective knowledge about protein functions and pathways is scattered throughout numerous publications in scientific journals. Bringing the relevant information together becomes a bottleneck in a research and discovery process. The volume of such information grows exponentially, which renders manual curation impractical. As a viable alternative, automated literature processing tools could be employed to extract and organize biological data into a knowledge base, making it amenable to computational analysis and data mining. Results: We present MedScan, a completely automated natural language processing-based information extraction system. We have used MedScan to extract 2976 interactions between human proteins from MEDLINE abstracts dated after 1988. The precision of the extracted information was found to be 91%. Comparison with the existing protein interaction databases BIND and DIP revealed that 96% of extracted information is novel. The recall rate of MedScan was found to be 21%. Additional experiments with MedScan suggest that MEDLINE is a unique source of diverse protein function information, which can be extracted in a completely automated way with a reasonably high precision. Further directions of the MedScan technology improvement are discussed. Availability: MedScan is available for commercial licensing from Ariadne Genomics, Inc.