Finding recurrent sources in sequences

Authors:
Aristides Gionis;Heikki Mannila
Affiliations:
Stanford University, Stanford, CA;University of Helsinki, Helsinki, Finland
Venue:
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Year:
2003

Citing 6
Cited 13

Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
On the approximation of curves by line segments using dynamic programming

Communications of the ACM
DNA segmentation as a model selection process

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
A local search approximation algorithm for k-means clustering

Proceedings of the eighteenth annual symposium on Computational geometry
An Online Algorithm for Segmenting Time Series

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Time Series Segmentation for Context Recognition in Mobile Devices

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining

A symbolic representation of time series, with implications for streaming algorithms

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Automatic organization for digital photographs with geographic coordinates

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Optimizing time series discretization for knowledge discovery

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Aggregating time partitions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Experiencing SAX: a novel symbolic representation of time series

Data Mining and Knowledge Discovery
Efficient algorithms for segmentation of item-set time series

Data Mining and Knowledge Discovery
Constructing comprehensive summaries of large event sequences

ACM Transactions on Knowledge Discovery from Data (TKDD)
Recurrent predictive models for sequence segmentation

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Approximate variable-length time series motif discovery using grammar inference

Proceedings of the Tenth International Workshop on Multimedia Data Mining
Unraveling complex temporal associations in cellular systems across multiple time-series microarray datasets

Journal of Biomedical Informatics
Research and applications on georeferenced multimedia: a survey

Multimedia Tools and Applications
Palmprint authentication using time series

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
A clustering algorithm based on distinguishability for nominal attributes

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or assembled from a small number of sources, each of which might contribute several segments to the sequence. That is, there are h hidden sources such that the sequence can be written as a concatenation of k h pieces, each of which stems from one of the h sources. We define this (k,h)-segmentation problem and show that it is NP-hard in the general case. We give approximation algorithms achieving approximation ratios of 3 for the L1 error measure and √5 for the L2 error measure, and generalize the results to higher dimensions. We give empirical results on real (chromosome 22) and artificial data showing that the methods work well in practice.