Heuristic Bayesian Segmentation for Discovery of Coexpressed Genes within Genomic Regions

Authors:
Petri Pehkonen;Garry Wong;Petri Toronen
Affiliations:
University of Kuopio, Kuopio;University of Kuopio, Kuopio;University of Helsinki, Kuopio
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2010

Citing 11
Cited 0

Algorithms for clustering data

Algorithms for clustering data
On the approximation of curves by line segments using dynamic programming

Communications of the ACM
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Variational Extensions to EM and Multinomial PCA

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Time Series Segmentation for Context Recognition in Mobile Devices

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Latent dirichlet allocation

The Journal of Machine Learning Research
Hidden Markov models approach to the analysis of array CGH data

Journal of Multivariate Analysis
BAPS 2: enhanced possibilities for the analysis of genetic population structure

Bioinformatics
A Mathematical Theory of Communication

A Mathematical Theory of Communication
Exact and efficient Bayesian inference for multiple changepoint problems

Statistics and Computing
Bayesian search of functionally divergent protein subgroups and their function specific residues

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.