An effective soft clustering approach to mining gene expressions from multi-source databases

  • Authors:
  • Chien-I Lee;Hsiu-Min Chuang

  • Affiliations:
  • Department of Information and Learning Technology, National University of Tainan, Tainan City, Taiwan;Department of Information and Learning Technology, National University of Tainan, Tainan City, Taiwan

  • Venue:
  • AIKED'07 Proceedings of the 6th Conference on 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases - Volume 6
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, many technologies that are used to analyze genes were proposed. Huge amount of biological databases, such as microarray data, biomedical literatures, sequence data and genome structure data et al., have formed useful data warehouses to mine gene-gene relations and predict the gene networks in advance. In the field of bioinformatics, the clustering of gene expressions is a common technology to extract the new knowledge. However, to raise the accuracy of gene clusters is a challenge because of the errors of biological databases and divergence of various clustering methods. In this paper, Multi-Source Soft Clustering (MSSC), which is an integrated framework of the clustering methods and multi-source databases, is presented to raise the accuracy. Two soft clustering methods, fuzzy c-means and soft CAST, are applied to solve the questions that genes may have multi-functions and involve several biological pathways. Combining microarray data and biomedical literatures to improve the overall accuracy may be better than using only one single dataset. In addition, the MSSC adopts the concept of clustering before integrating, and uses the correlation coefficient in statistics to calculate the distances of the matrices between the diverse soft clustering results. The experimental result shows that MSSC approach can be relatively more effective.