Approximation schemes for Euclidean k-medians and related problems
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
On the approximation of curves by line segments using dynamic programming
Communications of the ACM
DNA segmentation as a model selection process
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
A local search approximation algorithm for k-means clustering
Proceedings of the eighteenth annual symposium on Computational geometry
An Online Algorithm for Segmenting Time Series
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Time Series Segmentation for Context Recognition in Mobile Devices
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A symbolic representation of time series, with implications for streaming algorithms
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Automatic organization for digital photographs with geographic coordinates
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Optimizing time series discretization for knowledge discovery
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Experiencing SAX: a novel symbolic representation of time series
Data Mining and Knowledge Discovery
Efficient algorithms for segmentation of item-set time series
Data Mining and Knowledge Discovery
Constructing comprehensive summaries of large event sequences
ACM Transactions on Knowledge Discovery from Data (TKDD)
Recurrent predictive models for sequence segmentation
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Approximate variable-length time series motif discovery using grammar inference
Proceedings of the Tenth International Workshop on Multimedia Data Mining
Journal of Biomedical Informatics
Research and applications on georeferenced multimedia: a survey
Multimedia Tools and Applications
Palmprint authentication using time series
AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
A clustering algorithm based on distinguishability for nominal attributes
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Hi-index | 0.00 |
Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or assembled from a small number of sources, each of which might contribute several segments to the sequence. That is, there are h hidden sources such that the sequence can be written as a concatenation of k h pieces, each of which stems from one of the h sources. We define this (k,h)-segmentation problem and show that it is NP-hard in the general case. We give approximation algorithms achieving approximation ratios of 3 for the L1 error measure and √5 for the L2 error measure, and generalize the results to higher dimensions. We give empirical results on real (chromosome 22) and artificial data showing that the methods work well in practice.