Collaborative text-annotation resource for disease-centered relation extraction from biomedical text

Authors:
C. Cano;T. Monaghan;A. Blanco;D. P. Wall;L. Peshkin
Affiliations:
Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain;Center for Biomedical Informatics, Harvard Medical School, 200 Longwood Ave., Boston, MA 02115, USA;Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain;Center for Biomedical Informatics, Harvard Medical School, 200 Longwood Ave., Boston, MA 02115, USA;Center for Biomedical Informatics, Harvard Medical School, 200 Longwood Ave., Boston, MA 02115, USA
Venue:
Journal of Biomedical Informatics
Year:
2009

Citing 9
Cited 3

Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Discovering patterns to extract protein--protein interactions from full texts

Bioinformatics
BioContrasts: extracting and exploiting protein--protein contrastive relations from biomedical literature

Bioinformatics
Text Mining for Biology And Biomedicine

Text Mining for Biology And Biomedicine
Multi-way relation classification: application to protein-protein interactions

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Refactoring corpora

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Guest Editorial: Current issues in biomedical text mining and natural language processing

Journal of Biomedical Informatics
Relevant shape contour snippet extraction with metadata supported hidden Markov models

Proceedings of the ACM International Conference on Image and Video Retrieval
Automatic integration of drug indications from multiple health resources

Proceedings of the 1st ACM International Health Informatics Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text mining for relation extraction in the biomedical literature. Relation extraction systems cannot be developed without substantial datasets annotated with ground truth for benchmarking and training. The creation of such datasets is hampered by the absence of a resource for launching a distributed annotation effort, as well as by the lack of a standardized annotation schema. We have developed an annotation schema and an annotation tool which can be widely adopted so that the resulting annotated corpora from a multitude of disease studies could be assembled into a unified benchmark dataset. The contribution of this paper is threefold. First, we provide an overview of available benchmark corpora and derive a simple annotation schema for specific binary relation extraction problems such as protein-protein and gene-disease relation extraction. Second, we present BioNotate: an open source annotation resource for the distributed creation of a large corpus. Third, we present and make available the results of a pilot annotation effort of the autism disease network.