Algorithms for clustering data
Algorithms for clustering data
On the approximation of curves by line segments using dynamic programming
Communications of the ACM
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Variational Extensions to EM and Multinomial PCA
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Time Series Segmentation for Context Recognition in Mobile Devices
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
The Journal of Machine Learning Research
Hidden Markov models approach to the analysis of array CGH data
Journal of Multivariate Analysis
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Exact and efficient Bayesian inference for multiple changepoint problems
Statistics and Computing
Hi-index | 0.00 |
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.