Measuring coherence between electronic and manual annotations in biological databases

  • Authors:
  • Catia Pesquita;Daniel Faria;Francisco M. Couto

  • Affiliations:
  • University of Lisbon;University of Lisbon;University of Lisbon

  • Venue:
  • Proceedings of the 2009 ACM symposium on Applied Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of controlled structured vocabularies for annotation purposes, such as the Gene Ontology (GO) is currently one of the strategies to cope with the increasingly cumbersome task of genome annotation. The Gene Ontology Annotation Database (GOA) uses GO to annotate gene products through curated literature analysis and uncurated electronic methods. Although electronic annotations constitute the large majority of annotations (over 95%), most researchers are reluctant to use them in their studies, since they are regarded as having a lower quality than curated ones. Assessing the quality of electronic annotations may help clarify the advantages and disadvantages of their use. This paper proposes a preliminary measure of electronic annotation quality based on the coherence between electronic and manual annotations. Coherence is analysed both at the gene product and at the annotation level, based on semantic similarity of Gene Ontology terms. We have found that average annotation coherence values are around 60%, but can be as high as 81% for a less granular analysis. Based on this analysis we propose meaningful coherence thresholds for electronic annotation selection and filtering, and for highlighting gene products for annotation revision.