Manual curation is not sufficient for annotation of genomic databases

Authors:
William A. Baumgartner;K. Bretonnel Cohen;Lynne M. Fox;George Acquaah-Mensah;Lawrence Hunter
Affiliations:
-;-;-;-;-
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 20

Gene ontology annotation as text categorization: An empirical study

Information Processing and Management: an International Journal
Automated comparative auditing of NCIT genomic roles using NCBI

Journal of Biomedical Informatics
Towards automatic generation of gene summary

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
On the Reachability of Trustworthy Information from Integrated Exploratory Biological Queries

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Software testing and the naturally occurring data assumption in natural language processing

SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Guest Editorial: Current issues in biomedical text mining and natural language processing

Journal of Biomedical Informatics
Gene Functional Annotation with Dynamic Hierarchical Classification Guided by Orthologs

DS '09 Proceedings of the 12th International Conference on Discovery Science
Ontology consolidation in bioinformatics

APCCM '10 Proceedings of the Seventh Asia-Pacific Conference on Conceptual Modelling - Volume 110
Enabling annotation provenance in bioinformatics workflow applications

BSB'10 Proceedings of the Advances in bioinformatics and computational biology, and 5th Brazilian conference on Bioinformatics
Application of semantic kernels to literature-based gene function annotation

DS'11 Proceedings of the 14th international conference on Discovery science
Towards automatic pathway generation from biological full-text publications

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Improving data quality by source analysis

Journal of Data and Information Quality (JDIQ)
Mixture of logistic models and an ensemble approach for protein-protein interaction extraction

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Mining protein-protein interactions from GeneRIFs with OpenDMAP

ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
Guest Editorial: The state of the art in text mining and natural language processing for pharmacogenomics

Journal of Biomedical Informatics
A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text

Journal of Biomedical Informatics
Relation mining experiments in the pharmacogenomics domain

Journal of Biomedical Informatics
Visualizing the protein sequence universe

Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
High performance computing workflow for protein functional annotation

Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Comparative meta-analysis between human and mouse cancer microarray data reveals critical pathways

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents. Results: Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes. Contact: larry.hunter@uchsc.edu