Validating associations in biological databases

  • Authors:
  • Francisco M. Couto;Mário J. Silva;Pedro M. Coutinho

  • Affiliations:
  • University of Lisboa;University of Lisboa;Centre National de la Recherche Scientifique

  • Venue:
  • CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Erroneous data can often be found in databases, and detecting it is normally a non-trivial task. For example, To cope with the large amount of biological sequences being produced, a significant number of genes and proteins have been annotated by automated tools. A protein annotation is an association between a protein and a term describing its role. These tools have produced a significant number of misannotations that are now present in biological databases. This paper proposes a new method for automatically scoring associations by comparing them to preexisting curated associations. An association is a pair that links two entities. The score can be used to filter incorrect or uncommon associations.We evaluated the method using the automated protein annotations submitted to BioCreAtIvE, an international evaluation of state-of-the-art text-mining systems in Biology. The method scored each of these annotations and those scored below a certain threshold were discarded. The results have shown a small trade-off in recall for a large improvement in precision. For example, we were able to discard 44.6%, 66.8% and 81% of the misannotations, maintaining 96.9%, 84.2%, and 47.8% of the correct annotations, respectively. Moreover, we were able to outperform each individual submission to BioCreAtIvE by proper adjustment of the threshold.