Periodicity testing with sublinear samples and space

Authors:
Funda Ergun;S. Muthukrishnan;Cenk Sahinalp
Affiliations:
Simon Fraser University, Burnaby, BC, Canada;Google Research, New York, NY;Simon Fraser University, Burnaby, BC, Canada
Venue:
ACM Transactions on Algorithms (TALG)
Year:
2010

Citing 8
Cited 1

Property testing and its connection to learning and approximation

Journal of the ACM (JACM)
Time series similarity measures (tutorial PM-2)

Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Near-optimal sparse fourier representations via sampling

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Robust Characterizations of Polynomials withApplications to Program Testing

SIAM Journal on Computing
Overcoming Limitations of Sampling for Aggregation Queries

Proceedings of the 17th International Conference on Data Engineering
Identifying Representative Trends in Massive Time Series Data Sets Using Sketches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A sublinear algorithm for weakly approximating edit distance

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Testing periodicity

APPROX'05/RANDOM'05 Proceedings of the 8th international workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th international conference on Randamization and Computation: algorithms and techniques

Periodicity and cyclic shifts via linear sketches

APPROX'11/RANDOM'11 Proceedings of the 14th international workshop and 15th international conference on Approximation, randomization, and combinatorial optimization: algorithms and techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we are interested in periodic trends in long data streams in the presence of computational constraints. To this end; we present algorithms for discovering periodic trends in the combinatorial property testing model in a data stream S of length n using o(n) samples and space. In accordance with the property testing model, we first explore the notion of being “close” to periodic by defining three different notions of self-distance through relaxing different notions of exact periodicity. An input S is then called approximately periodic if it exhibits a small self-distance (with respect to any one self-distance defined). We show that even though the different definitions of exact periodicity are equivalent, the resulting definitions of self-distance and approximate periodicity are not; we also show that these self-distances are constant approximations of each other. Afterwards, we present algorithms which distinguish between the two cases where S is exactly periodic and S is far from periodic with only a constant probability of error. Our algorithms sample only O(&sqrt;nlog2 n) (or O(&sqrt;nlog4 n), depending on the self-distance) positions and use as much space. They can also find, using o(n) samples and space, the largest/smallest period, and/or all of the approximate periods of S. These algorithms can also be viewed as working on streaming inputs where each data item is seen once and in order, storing only a sublinear (O(&sqrt;nlog2 n) or O(&sqrt;nlog4 n)) size sample from which periodicities are identified.