Discovering patterns to extract protein--protein interactions from full texts

Authors:
Minlie Huang;Xiaoyan Zhu;Yu Hao;Donald G. Payan;Kunbin Qu;Ming Li
Affiliations:
State Key Laboratory of Intelligent Technology and Systems (LITS), Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China;State Key Laboratory of Intelligent Technology and Systems (LITS), Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China;State Key Laboratory of Intelligent Technology and Systems (LITS), Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China;Rigel Pharmaceuticals Inc, 1180 Veterans. Blvd, South San Francisco, CA 94080, USA;Rigel Pharmaceuticals Inc, 1180 Veterans. Blvd, South San Francisco, CA 94080, USA;Bioinformatics Laboratory, School of Computer Science, University of Waterloo, N2L 3G1, Ontario, Canada
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 40

Semi-supervised learning of the hidden vector state model for extracting protein-protein interactions

Artificial Intelligence in Medicine
Methodological Review: Extracting interactions between proteins from the literature

Journal of Biomedical Informatics
Extracting Protein-Protein Interactions from MEDLINE using the Hidden Vector State model

International Journal of Bioinformatics Research and Applications
Extraction of protein interaction data: a comparative analysis of methods in use

EURASIP Journal on Bioinformatics and Systems Biology
Building ontological relationships: A new approach

Journal of the American Society for Information Science and Technology
Training the Hidden Vector State Model from Un-annotated Corpus

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Hyponymy Patterns

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Boosting Chinese Question Answering with Two Lightweight Methods: ABSPs and SCO-QAT

ACM Transactions on Asian Language Information Processing (TALIP)
Uncertainty sampling-based active learning for protein-protein interaction extraction from biomedical literature

Expert Systems with Applications: An International Journal
Finding optimal parameters for edit distance based sequence classification is NP-hard

Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
BIOSMILE: adapting semantic role labeling for biomedical verbs: an exponential model coupled with automatically generated template features

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Extracting protein-protein interactions using simple contextual features

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Event extraction from trimmed dependency graphs

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
How feasible and robust is the automatic extraction of gene regulation events?: a cross-method evaluation under lab and real-life conditions

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Identifying interaction sentences from biological literature using automatically extracted patterns

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Alignment-based surface patterns for factoid question answering systems

Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
Collaborative text-annotation resource for disease-centered relation extraction from biomedical text

Journal of Biomedical Informatics
BIOSMILE: adapting semantic role labeling for biomedical verbs: an exponential model coupled with automatically generated template features

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Extracting protein-protein interactions using simple contextual features

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Sentence identification of biological interactions using PATRICIA tree generated patterns and genetic algorithm optimized parameters

Data & Knowledge Engineering
Predicting protein-protein interactions using numerical associational features

CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles

Journal of Biomedical Informatics
Measuring prediction capacity of individual verbs for the identification of protein interactions

Journal of Biomedical Informatics
Using maximum entropy model to extract protein-protein interaction information from biomedical literature

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Mining the relationship between gene and disease from literature

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Large scale relation detection

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Unsupervised relation extraction using dependency trees for automatic generation of multiple-choice questions

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Biomedical events extraction using the hidden vector state model

Artificial Intelligence in Medicine
A tree kernel-based method for protein-protein interaction mining from biomedical literature

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Extracting protein-protein interactions in biomedical literature using an existing syntactic parser

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
EKOSS: a knowledge-user centered approach to knowledge sharing, discovery, and integration on the semantic web

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
A new algorithm for pattern optimization in protein-protein interaction extraction system

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Extraction of gene/protein interaction from text documents with relation kernel

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Mixture of logistic models and an ensemble approach for protein-protein interaction extraction

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Populating an allergens ontology using natural language processing and machine learning techniques

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
GeneTUC, GENIA and google: natural language understanding in molecular biology literature

Transactions on Computational Systems Biology V
Extracting protein-protein interactions from the literature using the hidden vector state model

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
A hybrid approach to gene ranking using gene relation networks derived from literature for the identification of disease gene markers

International Journal of Data Mining and Bioinformatics
High precision rule based PPI extraction and per-pair basis performance evaluation

Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Although there are several databases storing protein--protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein--protein interactions from biomedical texts. Results: We present a novel and robust approach for extracting protein--protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%. Availability: The program is available on request from the authors.