Mutual information clustering for efficient mining of fuzzy association rules with application to gene expression data analysis

  • Authors:
  • Stergios Papadimitriou;Seferina Mavroudi;Spiridon Likothanassis

  • Affiliations:
  • Department of Information Management, Technological Educational Institute of Kavala, Kavala, Greece;Pattern Recognition Laboratory, Department of Computer Engineering and Informatics, School of Engineering, University of Patras, Rion, Patras, Greece;Pattern Recognition Laboratory, Department of Computer Engineering and Informatics, School of Engineering, University of Patras, Rion, Patras, Greece

  • Venue:
  • ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The extraction of fuzzy association rules for the description of dependencies and interactions from large data sets as those arising in gene expression data analysis applications perplexes very difficult combinatorial problems that depend heavily on the size of these sets. The paper describes a two stage approach to the problem that obtains computationally manageable solutions. The first stage aims to cluster transactions that more probably are associated. Thereafter, the second stage, the fuzzy association rule extraction follows, confronting a significantly reduced problem. The clustering phase is accomplished by means of a Kernel Supervised Dynamic Grid Self-Organized Map (KSDG-SOM). The mutual information metric controls the development of the KSDG-SOM clusters. This metric allows the formation of data clusters that maximize the mutual information for transactions of the same cluster and to minimize it between different clusters. In addition the KSDG-SOM is capable of incorporating a priori information concerning the transaction's items that can focus the model to cluster together even more probably associated items. After this initial data clustering we concetrate on whether the pattern of a transaction can be associated with characteristics of the patterns of other transactions of the same node. Therefore, the fuzzy association rules are extracted locally on a per cluster basis. The paper focuses on the application of the techniques for mining the gene expression data. However, the presented techniques can easily be adapted and can be fruitful for intelligent exploration of any other data set as well.