Clustering of biological time series by cepstral coefficients based distances

Authors:
Alexios Savvides;Vasilis J. Promponas;Konstantinos Fokianos
Affiliations:
Department of Mathematics and Statistics, University of Cyprus, P.O. Box 20537, 1678 Nicosia, Cyprus;Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, P.O. Box 20537, 1678 Nicosia, Cyprus;Department of Mathematics and Statistics, University of Cyprus, P.O. Box 20537, 1678 Nicosia, Cyprus
Venue:
Pattern Recognition
Year:
2008

Citing 7
Cited 2

Time series: theory and methods

Time series: theory and methods
Applied multivariate statistical analysis

Applied multivariate statistical analysis
Distance Measures for Effective Clustering of ARIMA Time-Series

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Prediction of Signal Peptides and Signal Anchors by a Hidden Markov Model

ISMB '98 Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology
Time Series Analysis and Its Applications (Springer Texts in Statistics)

Time Series Analysis and Its Applications (Springer Texts in Statistics)
A periodogram-based metric for time series classification

Computational Statistics & Data Analysis
Clustering of time series data-a survey

Pattern Recognition

Fuzzy clustering of time series in the frequency domain

Information Sciences: an International Journal
Wavelets-based clustering of multivariate time series

Fuzzy Sets and Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering of stationary time series has become an important tool in many scientific applications, like medicine, finance, etc. Time series clustering methods are based on the calculation of suitable similarity measures which identify the distance between two or more time series. These measures are either computed in the time domain or in the spectral domain. Since the computation of time domain measures is rather cumbersome we resort to spectral domain methods. A new measure of distance is proposed and it is based on the so-called cepstral coefficients which carry information about the log spectrum of a stationary time series. These coefficients are estimated by means of a semiparametric model which assumes that the log-likelihood ratio of two or more unknown spectral densities has a linear parametric form. After estimation, the estimated cepstral distance measure is given as an input to a clustering method to produce the disjoint groups of data. Simulated examples show that the method yields good results, even when the processes are not necessarily linear. These cepstral-based clustering algorithms are applied to biological time series. In particular, the proposed methodology effectively identifies distinct and biologically relevant classes of amino acid sequences with the same physicochemical properties, such as hydrophobicity.