MedPost: a part-of-speech tagger for bioMedical text
Bioinformatics
Tuning support vector machines for biomedical named entity recognition
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Dependency tree kernels for relation extraction
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
RelEx---Relation extraction using dependency parse trees
Bioinformatics
Design of a multi-lingual, parallel-processing statistical parsing engine
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Methodological Review: Extracting interactions between proteins from the literature
Journal of Biomedical Informatics
Inter-species normalization of gene mentions with GNAT
Bioinformatics
Event extraction from trimmed dependency graphs
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
A rich feature vector for protein-protein interaction extraction from multiple corpora
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient Extraction of Protein-Protein Interactions from Full-Text Articles
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
The extraction of protein-protein interactions (PPIs) reported in scientific publications is one of the most studied topics in Text Mining in the Life Sciences, as such algorithms can substantially decrease the effort for databases curators. The currently best methods for this task are based on analyzing the dependency tree (DT) representation of sentences. Many approaches exploit only topological features and thus do not yet fully exploit the information contained in DTs. We show that incorporating the grammatical information encoded in the types of the dependencies in DTs noticeably improves extraction performance by using a pattern matching approach. We automatically infer a large set of linguistic patterns using only information about interacting proteins. Patterns are then refined based on shallow linguistic features and the semantics of dependency types. Together, these lead to a total improvement of 17.2 percent points in F1, as evaluated on five publicly available PPI corpora. More than half of that improvement is gained by properly handling dependency types. Our method provides a general framework for building task-specific relationship extraction methods that do not require annotated training data. Furthermore, our observations offer methods to improve upon relation extraction approaches.