Associative Clustering for Exploring Dependencies between Functional Genomics Data Sets

Authors:
Samuel Kaski;Janne Nikkila;Janne Sinkkonen;Leo Lahti;Juha E. A. Knuuttila;Christophe Roos
Affiliations:
-;-;-;-;-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2005

Citing 8
Cited 4

A framework for measuring changes in data characteristics

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Unsupervised document classification using sequential information maximization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering based on conditional distributions in an auxiliary space

Neural Computation
Multivariate Information Bottleneck

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Analysis and visualization of gene expression data using self-organizing maps

Neural Networks - New developments in self-organizing maps
Sequential information bottleneck for finite data

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Discriminative clustering

Neurocomputing

Local dependent components

Proceedings of the 24th international conference on Machine learning
Probabilistic approach to detecting dependencies between data sets

Neurocomputing
Unifying dependent clustering and disparate clustering for non-homogeneous data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
How to "alternatize" a clustering algorithm

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-throughput genomic measurements, interpreted as cooccurring data samples from multiple sources, open up a fresh problem for machine learning: What is in common in the different data sets, that is, what kind of statistical dependencies are there between the paired samples from the different sets? We introduce a clustering algorithm for exploring the dependencies. Samples within each data set are grouped such that the dependencies between groups of different sets capture as much of pairwise dependencies between the samples as possible. We formalize this problem in a novel probabilistic way, as optimization of a Bayes factor. The method is applied to reveal commonalities and exceptions in gene expression between organisms and to suggest regulatory interactions in the form of dependencies between gene expression profiles and regulator binding patterns.