The Frame-Based Module of the SUISEKI Information Extraction System
IEEE Intelligent Systems
A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Detecting Protein-Protein Interaction Sentences Using a Mixture Model
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Extracting protein-protein interactions using simple contextual features
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Evolutionary hypernetwork classifiers for protein-proteininteraction sentence filtering
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Analyzing text in search of bio-molecular events: a high-precision machine learning framework
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
The extraction of enriched protein-protein interactions from biomedical text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Extracting protein-protein interactions using simple contextual features
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
We propose a method for automated extraction of protein-protein interactions from scientific text. Our system matches sentences against syntax patterns typically describing protein interactions. We define a set of 22 patterns, each a regular expression consisting of anchor positions and parameterizable constraints. This small set is then refined and optimized using a genetic algorithm on a training set. No heuristic definitions are necessary, and the final pattern set can be generated completely without manual curation. Our method can be applied to any syntax pattern-based protein-protein interaction system and thus complements related work on building comprehensive sets of such patterns. The application of different fitness-functions during evolution provides an easy way to tune the system either toward precision, recall, or f-measure. We evaluate our system on two samples, one derived from the BioCreAtIvE corpus, the other from references in the DIP. The automatic refinement of patterns adds up to 16% to the precision, and 5% to the recall of our system. We additionally study the impact of a proper protein name recognition, which could improve precision by about 17% and recall by 12%.