Ontology-Based genes similarity calculation with TF-IDF

  • Authors:
  • Yue Huang;Mingxin Gan;Rui Jiang

  • Affiliations:
  • School of Economics and Management, University of Science and Technology Beijing, Beijing, P.R. China;School of Economics and Management, University of Science and Technology Beijing, Beijing, P.R. China;Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, P.R. China

  • Venue:
  • ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Gene Ontology (GO) provides a controlled vocabulary of terms for describing genes from different data resources. In this paper, we proposed a novel method determining semantic similarity of genes based on GO. The key principle of our method relies on the introduction of Term Frequency (TF) and Inverse Document Frequency (IDF) to quantify the weights of different GO terms to the same gene. Different from previous leading methods, our method needs no parameters and computes the gene similarity directly rather than term similarity first. Experimental results of clustering genes in biological pathways from Saccharomyces Genome Database (SGD) have demonstrated that our method is quite competitive and outperforms leading method in certain cases.